Alfresco OCR Transform Engine
by Angel Borroy
Community
Alfresco Transform Engine that converts PDF files to searchable, text-layer PDFs using OCR (ocrmypdf / Tesseract). Compatible with ACS 7.0+ as a local or async T-Engine.
About
Runs ocrmypdf (a Tesseract wrapper) inside an Alfresco Transform Engine to produce OCR’d PDFs.
- Original PDF kept as version 1.0; OCR’d version saved as 1.1
- Includes a companion embed-metadata Repository add-on that adds the OCR action to Alfresco Share folder rules
- Configurable
ocrmypdfarguments (e.g.--skip-text,--force-ocr, language) - Deployable as a local T-Engine (Community) or async T-Engine via ActiveMQ (Enterprise)
- Community: add
localTransform.ocr.url=http://transform-ocr:8090/to Alfresco JAVA_OPTS - Enterprise: register URL + queue (
ocr-engine-queue) with the Transform Router