Inference
AIInference is the process of running an LLM to produce output from input. In DocLD, inference occurs when the model generates RAG answers, extracts structured data from documents, or scores chunks during reranking.
When Inference Runs
| Use Case | What Inference Does |
|---|---|
| RAG chat | Generates answers from retrieved chunks and citations |
| Extraction | Extracts field values from document content using schema instructions |
| LLM reranking | Scores each chunk by relevance to the query |
Inference adds latency and cost. DocLD uses lower temperatures for factual tasks and limits context-window usage to keep inference efficient.
Related Concepts
Inference is the runtime phase of LLM usage. RAG and extraction both rely on inference. Embedding generation is separate from LLM inference.