Source Document
ConceptsSource document is the original file or document uploaded to DocLD for processing. Parsing extracts text, tables, and layout from the source; chunking and embedding operate on that extracted content. Extraction and RAG work from the parsed result, not the raw file.
Lifecycle
- Document upload — Source document is uploaded
- Parse — Parsing extracts content; OCR may run for scanned or image content
- Chunk and embed — Content is chunked and embedded for vector search
- Extract / RAG — Extraction and RAG use the parsed content
Related Concepts
Source documents enter via document upload. Parsing and OCR process them; chunking and embedding prepare content for vector search and RAG.