Metadata
ConceptsMetadata is structured information attached to documents and chunks. It describes the content and supports filtering, analytics, and retrieval. DocLD adds metadata during parsing and chunking; you can add custom metadata when uploading documents or via the API.
Document Metadata
| Field | Description |
|---|---|
filename | Original file name |
pageCount | Number of pages |
documentType | Detected type (PDF, image, spreadsheet, etc.) |
processedAt | When processing completed |
tags | Custom tags for organization |
Document metadata is set during upload and parsing. Custom tags can be added when uploading or updated later for organization and vector search filtering.
Chunk Metadata
Each chunk has metadata used in vector search and the index:
| Field | Description |
|---|---|
document_id | Source document |
page or pageRange | Page location within the document |
chunk_index | Order within document |
section | Heading or section name (if available) |
content_type | Text, table, figure |
Chunk metadata is derived from parsing and chunking. It is stored with embeddings in Pinecone and used for filtering at query time.
Custom Metadata
You can add custom metadata when uploading documents:
- Tags — User-defined tags for organization (e.g., department, project, year)
- Source — Origin of the document (e.g., upload, API, workflow)
- Custom fields — Key-value pairs for your use case
Custom metadata is passed through to chunks and can be used for vector search filtering. For example, filter by documentType: "invoice" or tags: ["Q3"] to narrow results.
Filtering
Knowledge base search can filter by metadata to narrow results:
- Document type — Filter by PDF, image, spreadsheet, etc.
- Date range — Filter by
processedAtor custom date fields - Tags — Filter by custom tags
- Page — Filter by page or page range
Filtering reduces noise and improves relevance when your knowledge base contains diverse document types.
Indexing Behavior
Metadata is stored with each vector in the index. At query time, metadata filters are applied before or after vector search (depending on Pinecone configuration). Indexing custom metadata enables rich filtering without re-processing documents.
Related Concepts
Metadata is attached to documents and chunks. Parsing and chunking produce metadata. Vector search uses metadata for filtering. The index stores metadata with embeddings.