Loading…

Features Pricing

Log in Get started

Retrieval

Retrieval

AI

Retrieval is the step of fetching relevant documents or chunks for a given query. In RAG, retrieval runs first: the query is embedded, vector search (and optionally reranking) returns the top chunks, and those are passed to the LLM as context for completion.

Retrieval in DocLD

Embed query — Same embedding model as document chunks.
Search — Vector search (e.g., top-k) in vector database, optionally with metadata filters.
Rerank — Optional reranking to improve order.
Pass to LLM — Retrieved text is used as context for the answer with citations.

Retrieval quality depends on chunking, embedding, and knowledge base scope.

Related Concepts

Retrieval is the fetch phase of RAG. It uses vector search, semantic search, top-k, and optionally reranking. Results feed citation and completion.

Related terms

RAG
Vector Search
Semantic Search
Reranking
Top-K

See also

guides › rag setup

Frequently Asked Questions

Retrieval is the step of fetching relevant documents or chunks for a query. In RAG, retrieval runs first: vector search (and optionally reranking) returns top chunks, which are then used as context for the LLM.

Chunking strategy, embedding model, and knowledge base scope all affect retrieval. Keep chunks well-formed and the corpus focused. Optional reranking can further improve the order of retrieved chunks.

Controlled by top-k (e.g., 5–20). You retrieve that many; reranking may then reduce how many go to the LLM.

Run separate queries per knowledge base and merge, or check if the API supports querying multiple bases in one call.

Vector retrieval uses chunk content (via embeddings). Metadata is used for filtering before or after the vector search.

Good chunking (semantic boundaries), appropriate top-k, and optional reranking. Ensure the document is fully ingested.

For the same query and index, results are typically deterministic. Ties or approximate search may cause small variation.

Use metadata filter to restrict to one document. You get chunks from that document only.

Usually milliseconds for the vector search. Total RAG latency includes embedding the query and LLM generation.

See the DocLD documentation and glossary for details, and check the API reference for related endpoints and options.

On this page

Retrieval in DocLD
Related Concepts
Frequently Asked Questions

Product

Features
Pricing
API Reference

Industries

Healthcare
Retail
Food & Beverage
E-commerce
Construction
View all

Company

About
Careers

Resources

Documentation
Blog
Help Center
Status

Legal

Privacy Policy
Terms of Service
Trust

Connect

X
GitHub
LinkedIn

© 2026 DocLD, Inc.SOC audit in progress