Document Processing | Glossary | DocLD

Document processing is the end-to-end pipeline of ingesting, parsing, and transforming documents into searchable or structured data. In DocLD, it typically includes upload, parsing (with OCR when needed), chunking, embedding, and optionally extraction or indexing in a knowledge base.

Stages

Stage	Description
Ingestion	Documents enter the system via upload, API, or workflow trigger
Parse	Parsing extracts text, tables, and layout; OCR runs for scanned documents or images
Chunk	Content is chunked for embedding and vector search
Extract	Optional extraction with a schema produces structured data
Index	Chunks are embedded and stored for RAG or search

Processing can be triggered per document or in batch. Jobs track async runs; webhooks notify on completion.

Document processing is the umbrella for document pipeline, parsing, extraction, and ingestion. Workflows automate multi-step processing.

Frequently Asked Questions

Stages

Stage	Description
Ingestion	Documents enter the system via upload, API, or workflow trigger
Parse	Parsing extracts text, tables, and layout; OCR runs for scanned documents or images
Chunk	Content is chunked for embedding and vector search
Extract	Optional extraction with a schema produces structured data
Index	Chunks are embedded and stored for RAG or search

Processing can be triggered per document or in batch. Jobs track async runs; webhooks notify on completion.

Document processing is the umbrella for document pipeline, parsing, extraction, and ingestion. Workflows automate multi-step processing.

Stages

Related Concepts

Frequently Asked Questions

Stages

Related Concepts

Frequently Asked Questions