Hallucination | Glossary | DocLD

Hallucination is when a large language model (LLM) generates text that sounds plausible but is not supported by its training or context. The model may invent facts, figures, or references. In document Q&A, hallucination is a major risk: users expect answers grounded in their documents, not fabricated content.

How RAG Reduces Hallucination

RAG (Retrieval-Augmented Generation) addresses hallucination by:

Retrieving — Vector search finds relevant chunks from your documents
Constraining — The LLM is instructed to answer only from the retrieved context
Citing — Citations show which passages support each part of the answer

When the LLM has no relevant context, it should say "I don't know" or "Not found in your documents" rather than inventing an answer. DocLD's chat prompt enforces this behavior.

Citations as a Safeguard

Citations provide transparency. Users can verify that an answer is backed by actual document text. If a passage is misquoted or fabricated, the citation links to the source for verification. This accountability reduces trust issues from hallucination.

When Hallucination Can Still Occur

Weak retrieval — If vector search returns irrelevant chunks, the LLM may fill gaps with invented content
Ambiguous questions — The model may infer answers not explicitly in the documents
Extraction — Extraction can hallucinate field values if the schema is unclear; use confidence scores and citations to flag uncertain results

Reranking improves retrieval quality, and confidence scores help identify low-reliability outputs for review.

RAG and citations are the primary defenses against hallucination in DocLD. Vector search and reranking improve retrieval quality. LLM behavior is controlled via system prompts and context constraints.

Frequently Asked Questions

How RAG Reduces Hallucination

RAG (Retrieval-Augmented Generation) addresses hallucination by:

Retrieving — Vector search finds relevant chunks from your documents

Constraining — The LLM is instructed to answer only from the retrieved context

Citing — Citations show which passages support each part of the answer

When the LLM has no relevant context, it should say "I don't know" or "Not found in your documents" rather than inventing an answer. DocLD's chat prompt enforces this behavior.

When Hallucination Can Still Occur

Weak retrieval — If vector search returns irrelevant chunks, the LLM may fill gaps with invented content

Ambiguous questions — The model may infer answers not explicitly in the documents

Extraction — Extraction can hallucinate field values if the schema is unclear; use confidence scores and citations to flag uncertain results

Reranking improves retrieval quality, and confidence scores help identify low-reliability outputs for review.

Frequently Asked Questions

How RAG Reduces Hallucination

Citations as a Safeguard

When Hallucination Can Still Occur

Related Concepts

Frequently Asked Questions

How RAG Reduces Hallucination

Citations as a Safeguard

When Hallucination Can Still Occur

Related Concepts

Frequently Asked Questions