Index
StorageAn index in DocLD is the vector index where embedding vectors are stored and queried for vector search. DocLD uses Pinecone as its vector index: when documents are parsed and chunked, each chunk is embedded and stored in the index alongside metadata.
How the Index Works
- Indexing — Document chunks are embedded and upserted into Pinecone
- Metadata — Each vector is stored with metadata (document ID, page, chunk index, etc.)
- Querying — User queries are embedded and sent to the index; similarity search returns top-k chunks
- Scoping — Knowledge bases scope search by filtering on document IDs
The index is optimized for low-latency similarity search. DocLD uses Pinecone's integrated embedding model so you don't manage a separate embedding API.
Index Structure
| Component | Description |
|---|---|
| Namespace | Logical partition; DocLD uses a single namespace per deployment |
| Vectors | Embedding vectors for chunks and queries |
| Metadata | Document ID, page, chunk index, section, content type |
| Filtering | Metadata filters narrow results by document type, date, etc. |
Knowledge bases do not create separate indexes; they scope queries by filtering on which documents belong to each knowledge base.
Index Lifecycle
- Create — Index is created when you first set up DocLD
- Upsert — New or updated documents are chunked, embedded, and upserted
- Delete — Removing a document removes its chunks from the index
- Reindex — Re-processing a document replaces its vectors
Best Practices
Reindex when you change chunking strategy or need to refresh content—re-processing replaces a document's vectors in the index. Use metadata filters to narrow search by document type, date, or custom fields so the index stays fast and relevant.
Related Concepts
The index stores embeddings for vector search. Pinecone hosts the index. Chunking determines what gets indexed; metadata enables filtering.