Document Processing
ProcessingDocument processing is the end-to-end pipeline of ingesting, parsing, and transforming documents into searchable or structured data. In DocLD, it typically includes upload, parsing (with OCR when needed), chunking, embedding, and optionally extraction or indexing in a knowledge base.
Stages
| Stage | Description |
|---|---|
| Ingestion | Documents enter the system via upload, API, or workflow trigger |
| Parse | Parsing extracts text, tables, and layout; OCR runs for scanned documents or images |
| Chunk | Content is chunked for embedding and vector search |
| Extract | Optional extraction with a schema produces structured data |
| Index | Chunks are embedded and stored for RAG or search |
Processing can be triggered per document or in batch. Jobs track async runs; webhooks notify on completion.
Related Concepts
Document processing is the umbrella for document pipeline, parsing, extraction, and ingestion. Workflows automate multi-step processing.