Chunk Overlap | Glossary | DocLD

Chunk overlap is the amount of content shared between adjacent chunks. For example, if chunks are 500 tokens with 50-token overlap, the last 50 tokens of chunk N also appear at the start of chunk N+1. Overlap can improve recall when relevant content falls near chunk boundaries.

Trade-offs

Recall — Overlap reduces the chance that key content is split across chunks and missed by retrieval
Redundancy — Overlapping chunks increase index size and can return duplicate context for the LLM
Cost — More overlapping chunks mean more embeddings and storage

DocLD uses semantic chunking by default, which respects logical boundaries. Knowledge bases can be configured with different chunking settings, including overlap.

Chunk overlap is a chunking strategy that affects embedding and vector search. Top-k and reranking can mitigate recall issues without overlap, but overlap remains a common option for retrieval-heavy use cases.

Frequently Asked Questions

Trade-offs

Recall — Overlap reduces the chance that key content is split across chunks and missed by retrieval
Redundancy — Overlapping chunks increase index size and can return duplicate context for the LLM
Cost — More overlapping chunks mean more embeddings and storage

DocLD uses semantic chunking by default, which respects logical boundaries. Knowledge bases can be configured with different chunking settings, including overlap.

Trade-offs

Related Concepts

Frequently Asked Questions

Trade-offs

Related Concepts

Frequently Asked Questions