Corpus
ConceptsA corpus is a collection of documents used together for search, RAG, or analysis. In DocLD, a knowledge base effectively holds a corpus: you add documents, they are ingested (parsed, chunked, embedded), and vector search runs over that corpus to answer questions with citations. A document set may also represent a corpus for a specific use case.
Use Cases
- RAG — The corpus is the set of documents that ground answers; queries retrieve chunks from this corpus.
- Analytics — Analyze or aggregate over a defined corpus (e.g., all contracts, all invoices).
- Testing — Use a fixed corpus for ground truth or evaluation.
Corpus size and quality affect retrieval and citation quality. Keeping a knowledge base focused (one domain or use case) often improves results.
Related Concepts
Corpus is the set of documents in a knowledge base or document set. Ingestion builds the searchable index over the corpus; reindex refreshes it when settings change.