Reranking
AIReranking reorders vector search results by relevance before passing them to the LLM. Initial retrieval returns top-k similar chunks; reranking refines the order so the most relevant chunks are prioritized. This improves RAG answer quality by ensuring the LLM receives the best context.
Reranking Options in DocLD
| Type | Description | Trade-off |
|---|---|---|
| Heuristic | Keyword and phrase boosts; scoring based on query-term overlap | Fast, low cost; good for most cases |
| LLM | Model scores each chunk for relevance to the query | Higher accuracy; higher cost and latency |
| Hybrid | Combines heuristic and LLM scoring | Balance of speed, cost, and quality |
Default heuristic reranking works well for most cases. Consider LLM reranking when answers miss key information or irrelevant chunks appear in context.
When to Use Each Type
| Scenario | Recommended Type |
|---|---|
| General Q&A | Heuristic |
| Domain-specific or nuanced queries | LLM or hybrid |
| High-stakes answers (legal, financial) | LLM |
| Low latency required | Heuristic |
| Mixed document types | Hybrid |
Reranking affects latency and cost. Heuristic reranking adds minimal latency; LLM reranking adds a model call per chunk. DocLD lets you configure the strategy per knowledge base or chat session.
How Reranking Improves RAG
- Retrieve — Vector search returns top-k chunks (e.g., 20)
- Rerank — Reranking reorders these chunks by relevance
- Select — Top N chunks (e.g., 5–10) are passed to the LLM
- Generate — The LLM receives the best context for RAG answer generation
By retrieving more chunks and reranking, you improve recall while ensuring the LLM gets the most relevant subset. This is especially useful when vector search returns some near-misses that heuristic or LLM scoring can demote.
Configuration
Configure reranking per knowledge base or chat session:
- Strategy — Heuristic, LLM, or hybrid
- Top-N — How many chunks to pass to the LLM after reranking
- Thresholds — Optional minimum relevance score to exclude low-quality chunks
Related Concepts
Reranking refines vector search results. Top-k controls how many chunks are retrieved before reranking. RAG uses reranked chunks as context for the LLM.