Top-K | Glossary | DocLD

Top-K is the number of chunks retrieved from vector search before passing them to reranking or the LLM. For example, top-k=10 means the 10 most similar chunks to the query are returned. Top-k balances recall (getting enough relevant content) with context size and latency.

How Top-K Works

Query — User question is embedded and sent to vector database
Retrieve — Vector index returns the top-k most similar chunks (e.g., by cosine similarity)
Rerank — Optional reranking may reorder or filter these chunks
Generate — The LLM receives the (possibly reranked) chunks as context for RAG answer generation

Higher top-k improves recall: you're more likely to include all relevant chunks. But higher top-k also increases context size, latency, and cost. DocLD lets you configure top-k per knowledge base or chat session.

Choosing Top-K

Top-K	Trade-off	Best For
5–10	Fast, smaller context	Simple questions, narrow domains
10–20	Balanced	Most RAG use cases
20+	Higher recall	Complex questions, broad domains

Reranking can help: retrieve a larger top-k (e.g., 20), rerank to the best 5–10, then pass to the LLM. This improves precision without losing recall.

Best Practices

Start moderate — Default top-k (e.g., 10) works for most cases
Increase if recall is low — If answers miss key information, try higher top-k
Use reranking — Reranking lets you retrieve more and filter down
Match context window — Ensure top-k × chunk size fits within the LLM context window

Top-k controls vector search retrieval. Reranking refines the order of retrieved chunks. Chunking determines chunk size; RAG uses retrieved chunks as context for the LLM.

Frequently Asked Questions

How Top-K Works

Query — User question is embedded and sent to vector database

Retrieve — Vector index returns the top-k most similar chunks (e.g., by cosine similarity)

Rerank — Optional reranking may reorder or filter these chunks

Generate — The LLM receives the (possibly reranked) chunks as context for RAG answer generation

Choosing Top-K

Top-K	Trade-off	Best For
5–10	Fast, smaller context	Simple questions, narrow domains
10–20	Balanced	Most RAG use cases
20+	Higher recall	Complex questions, broad domains

Reranking can help: retrieve a larger top-k (e.g., 20), rerank to the best 5–10, then pass to the LLM. This improves precision without losing recall.

Best Practices

Start moderate — Default top-k (e.g., 10) works for most cases

Increase if recall is low — If answers miss key information, try higher top-k

Use reranking — Reranking lets you retrieve more and filter down

Match context window — Ensure top-k × chunk size fits within the LLM context window

Frequently Asked Questions

How Top-K Works

Choosing Top-K

Best Practices

Related Concepts

Frequently Asked Questions

How Top-K Works

Choosing Top-K

Best Practices

Related Concepts

Frequently Asked Questions