Alfresco TEngine PII Redaction
by Angel Borroy
Community
Alfresco Transform Engine that detects and redacts Personally Identifiable Information (PII) in PDF documents using Microsoft Presidio, producing a sanitized PDF or structured PII metadata.
About
Integrates Microsoft Presidio into an Alfresco T-Engine to redact PII from PDFs or extract it as metadata.
Two transform modes:
application/pdf→application/pdf: produces a redacted PDF with configurable label and score thresholdapplication/pdf→alfresco-metadata-extract: returns structured JSON with entity counts, scores, and values, mappable to Alfresco content model properties (pii:hasPII,pii:entities,pii:countPerson, etc.)
Configurable via pii_engine_config.json:
entities: list of PII types (PERSON, PHONE_NUMBER, EMAIL_ADDRESS, CREDIT_CARD, …)scoreThreshold: confidence threshold (0.0–1.0)label: replacement text for redacted content
Deployable as a local T-Engine (Community: localTransform.pdf-pii.url) or async T-Engine (Enterprise: queue pii-engine-queue).