OCR | Glossary | DocLD

OCR (Optical Character Recognition) converts images and scanned documents into machine-readable text. DocLD uses vision-language model (VLM) based OCR to handle scanned PDFs, photos of documents, and image files.

OCR in DocLD

Feature	Description
50+ languages	English, Spanish, Chinese, Japanese, Arabic, and more
Auto-detection	Detects document language when not specified
Tables	Preserves table structure and layout
Handwriting	Recognizes handwritten text
Layout	Maintains document structure and reading order

When OCR Runs

OCR is applied when:

Uploading scanned PDFs or images
Processing documents that lack embedded text
Extracting text from photos or screenshots

For native digital PDFs, DocLD parses text directly without OCR. For mixed documents, only image-based pages are sent through OCR. OCR output is then chunked and used for extraction or vector search.

Confidence and Quality

Results include confidence scores per text block. Low-confidence regions can be flagged for review or escalated to alternative models. DocLD's agentic OCR pipeline retries and validates output for improved accuracy.

OCR runs as part of parsing when documents lack embedded text (scanned PDFs, images). Parsed output feeds chunking and embedding for vector search, and supports extraction. Confidence scores help flag low-quality OCR regions for review.

Frequently Asked Questions

OCR in DocLD

Feature	Description
50+ languages	English, Spanish, Chinese, Japanese, Arabic, and more
Auto-detection	Detects document language when not specified
Tables	Preserves table structure and layout
Handwriting	Recognizes handwritten text
Layout	Maintains document structure and reading order

When OCR Runs

OCR is applied when:

Uploading scanned PDFs or images
Processing documents that lack embedded text
Extracting text from photos or screenshots

For native digital PDFs, DocLD parses text directly without OCR. For mixed documents, only image-based pages are sent through OCR. OCR output is then chunked and used for extraction or vector search.

OCR in DocLD

When OCR Runs

Confidence and Quality

Related Concepts

Frequently Asked Questions

OCR in DocLD

When OCR Runs

Confidence and Quality

Related Concepts

Frequently Asked Questions