Scanned Document | Glossary | DocLD

A scanned document is a document whose pages are stored as images—typically from a physical scanner or a photo. There is no embedded text; to get text, you must run OCR (optical character recognition) on each page image. PDFs can be scanned (image-only pages) or mixed (some native text, some scanned pages).

In DocLD

DocLD detects when a page lacks extractable text and runs OCR automatically. Parsing then returns text and layout so the document can be chunked, embedded, and extracted like a native PDF. Quality depends on scan resolution, contrast, and language support.

Scanned documents require OCR for text extraction. Native PDF does not. Both are handled by parsing and feed chunking and document processing.

Frequently Asked Questions

In DocLD

Scanned documents require OCR for text extraction. Native PDF does not. Both are handled by parsing and feed chunking and document processing.

In DocLD

Related Concepts

Frequently Asked Questions

In DocLD

Related Concepts

Frequently Asked Questions