Alfresco TEngine Convert to Markdown
by Angel Borroy
Community
AI-powered Alfresco Transform Engine that converts PDF files to clean, richly-described Markdown using Docling, with optional LLaVA multimodal image captioning via Ollama.
About
Transforms application/pdf → text/markdown using Docling.
| Capability | Details |
|---|---|
| PDF to Markdown | Extracts text and layout, turning each page into structured Markdown |
| Image handling | placeholder, embedded (base64), referenced (PNG), or described (LLaVA caption) |
| Multilingual captions | English, Spanish, French, German, Italian, Portuguese when using described mode |
| Alfresco‑ready | Implements the Alfresco Transform Core SPI (TransformEngine & CustomTransformer) |
| Containerised | Multi‑stage Docker build (Java 17 + Python 3.11), published to Docker Hub as angelborroy/alf-tengine-convert2md |
| ACS 26.1 ready | Ready-to-use docker-compose-261.yaml included |
Image captioning (
image=described) requires a local Ollama daemon withllavapulled.