OCR
ProcessingOCR (Optical Character Recognition) converts images and scanned documents into machine-readable text. DocLD uses vision-language model (VLM) based OCR to handle scanned PDFs, photos of documents, and image files.
OCR in DocLD
| Feature | Description |
|---|---|
| 50+ languages | English, Spanish, Chinese, Japanese, Arabic, and more |
| Auto-detection | Detects document language when not specified |
| Tables | Preserves table structure and layout |
| Handwriting | Recognizes handwritten text |
| Layout | Maintains document structure and reading order |
When OCR Runs
OCR is applied when:
- Uploading scanned PDFs or images
- Processing documents that lack embedded text
- Extracting text from photos or screenshots
For native digital PDFs, DocLD parses text directly without OCR. For mixed documents, only image-based pages are sent through OCR. OCR output is then chunked and used for extraction or vector search.
Confidence and Quality
Results include confidence scores per text block. Low-confidence regions can be flagged for review or escalated to alternative models. DocLD's agentic OCR pipeline retries and validates output for improved accuracy.
Related Concepts
OCR runs as part of parsing when documents lack embedded text (scanned PDFs, images). Parsed output feeds chunking and embedding for vector search, and supports extraction. Confidence scores help flag low-quality OCR regions for review.