Inference | Glossary | DocLD

Inference is the process of running an LLM to produce output from input. In DocLD, inference occurs when the model generates RAG answers, extracts structured data from documents, or scores chunks during reranking.

When Inference Runs

Use Case	What Inference Does
RAG chat	Generates answers from retrieved chunks and citations
Extraction	Extracts field values from document content using schema instructions
LLM reranking	Scores each chunk by relevance to the query

Inference adds latency and cost. DocLD uses lower temperatures for factual tasks and limits context-window usage to keep inference efficient.

Inference is the runtime phase of LLM usage. RAG and extraction both rely on inference. Embedding generation is separate from LLM inference.

Frequently Asked Questions

When Inference Runs

Use Case	What Inference Does
RAG chat	Generates answers from retrieved chunks and citations
Extraction	Extracts field values from document content using schema instructions
LLM reranking	Scores each chunk by relevance to the query

Inference adds latency and cost. DocLD uses lower temperatures for factual tasks and limits context-window usage to keep inference efficient.

Inference is the runtime phase of LLM usage. RAG and extraction both rely on inference. Embedding generation is separate from LLM inference.

When Inference Runs

Related Concepts

Frequently Asked Questions

When Inference Runs

Related Concepts

Frequently Asked Questions