LLM
AIAn LLM (Large Language Model) is an AI model trained on vast amounts of text to generate and understand language. DocLD uses LLMs for RAG chat (generating answers from retrieved context) and extraction (pulling structured data from documents). The LLM is instructed via prompts and receives document chunks or full documents as context.
How LLMs Are Used in DocLD
| Use Case | LLM Role | Context |
|---|---|---|
| RAG chat | Generate answers | Retrieved chunks from vector search |
| Extraction | Extract field values | Document content + schema instructions |
| Reranking | Score relevance | Optional reranking step |
The LLM is instructed to stay within context and cite sources. Citations show where information came from, reducing hallucination.
LLM Behavior
- Context window — The LLM has a finite context window; chunking and top-k keep retrieved content within limits
- Instructions — System prompts and schema instructions guide behavior
- Temperature — Controls randomness; DocLD uses lower temperatures for factual tasks
For RAG, the LLM receives only the retrieved chunks, not your entire document library. This keeps answers grounded and reduces hallucination.
Why LLMs Matter for Document Intelligence
- Understanding — LLMs interpret document content semantically, not just keyword matching
- Flexibility — Zero-shot extraction works without training on your documents
- Generative — Chat produces natural-language answers with citations
- Adaptable — Schema instructions and prompts adapt to different use cases
Related Concepts
LLMs power RAG and extraction. Hallucination is a risk; citations and grounding in retrieved context reduce it. Chunking and vector search determine what context the LLM receives.