Scanned Document
ProcessingA scanned document is a document whose pages are stored as images—typically from a physical scanner or a photo. There is no embedded text; to get text, you must run OCR (optical character recognition) on each page image. PDFs can be scanned (image-only pages) or mixed (some native text, some scanned pages).
In DocLD
DocLD detects when a page lacks extractable text and runs OCR automatically. Parsing then returns text and layout so the document can be chunked, embedded, and extracted like a native PDF. Quality depends on scan resolution, contrast, and language support.
Related Concepts
Scanned documents require OCR for text extraction. Native PDF does not. Both are handled by parsing and feed chunking and document processing.