Hallucination
AIHallucination is when a large language model (LLM) generates text that sounds plausible but is not supported by its training or context. The model may invent facts, figures, or references. In document Q&A, hallucination is a major risk: users expect answers grounded in their documents, not fabricated content.
How RAG Reduces Hallucination
RAG (Retrieval-Augmented Generation) addresses hallucination by:
- Retrieving — Vector search finds relevant chunks from your documents
- Constraining — The LLM is instructed to answer only from the retrieved context
- Citing — Citations show which passages support each part of the answer
When the LLM has no relevant context, it should say "I don't know" or "Not found in your documents" rather than inventing an answer. DocLD's chat prompt enforces this behavior.
Citations as a Safeguard
Citations provide transparency. Users can verify that an answer is backed by actual document text. If a passage is misquoted or fabricated, the citation links to the source for verification. This accountability reduces trust issues from hallucination.
When Hallucination Can Still Occur
- Weak retrieval — If vector search returns irrelevant chunks, the LLM may fill gaps with invented content
- Ambiguous questions — The model may infer answers not explicitly in the documents
- Extraction — Extraction can hallucinate field values if the schema is unclear; use confidence scores and citations to flag uncertain results
Reranking improves retrieval quality, and confidence scores help identify low-reliability outputs for review.
Related Concepts
RAG and citations are the primary defenses against hallucination in DocLD. Vector search and reranking improve retrieval quality. LLM behavior is controlled via system prompts and context constraints.