RAG Setup Guide
Build a Retrieval-Augmented Generation (RAG) pipeline with DocLD to chat with your documents.
What is RAG?
RAG combines document retrieval with AI generation:
- Retrieve - Find relevant document chunks
- Augment - Add context to the AI prompt
- Generate - Create answers using the context
This produces accurate, citation-backed responses from your documents.
Quick Start
1. Create a Knowledge Base
curl -X POST "/api/knowledge-bases" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Company Policies",
"description": "HR policies and procedures"
}'2. Add Documents
Upload documents to your knowledge base:
curl -X POST "/api/upload" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@employee-handbook.pdf" \
-F "knowledge_base_id=kb-uuid"Or add existing documents:
curl -X POST "/api/knowledge-bases/{kb_id}/documents" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"document_id": "doc-uuid"}'3. Chat with Documents
The chat API is stream-only and uses the AI SDK UIMessage contract. Create a session first, then send messages with message (UIMessage with role and parts), session_id, and knowledge_base_id. See the Chat API for full details.
Create a session:
curl -X POST "/api/chat/sessions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"knowledge_base_id": "kb-uuid"}'Send a message (use the session_id from the response):
curl -X POST "/api/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"message": { "role": "user", "parts": [{ "type": "text", "text": "What is the vacation policy?" }] },
"session_id": "session-uuid",
"knowledge_base_id": "kb-uuid"
}'How RAG Works in DocLD
Document Processing Pipeline
Upload → Parse → Chunk → Vectorize → Index- Parse - Extract text, tables, figures from documents
- Chunk - Split content into semantic units
- Vectorize - Chunk text is sent to Pinecone; embeddings are generated server-side (llama-text-embed-v2)
- Index - Records stored in Pinecone for semantic search
Query Pipeline
Question → Search (Pinecone embeds) → Retrieve → Generate → Respond- Search - Query text is sent to Pinecone; embeddings and similarity search happen server-side
- Retrieve - Get top-k relevant chunks
- Generate - AI creates answer using context
- Respond - Return answer with citations
Configuring RAG
Retrieval Settings
Configure how many results to retrieve:
{
"settings": {
"retrieval": {
"top_k": 5,
"threshold": 0.7,
"reranking": true
}
}
}| Setting | Default | Description |
|---|---|---|
top_k | 5 | Number of chunks to retrieve |
threshold | 0.7 | Minimum relevance score (0-1) |
reranking | true | Re-rank results for accuracy |
Chunking Strategy
Control how documents are split:
{
"settings": {
"chunking": {
"strategy": "semantic",
"max_size": 1000,
"overlap": 100
}
}
}| Strategy | Description |
|---|---|
semantic | Split by meaning (recommended) |
fixed | Fixed character count |
page | Split by page |
Best Practices
Document Preparation
- Quality over quantity - Curate relevant content
- Consistent formatting - Well-structured documents work better
- Remove noise - Exclude irrelevant sections
- Update regularly - Keep content current
Knowledge Base Organization
- Single topic - One domain per knowledge base
- Appropriate size - Not too small, not too large
- Related content - Documents should relate to each other
Query Optimization
- Be specific - Clear questions get better answers
- Context helps - Provide background when needed
- Follow up - Use conversation context
Advanced Configuration
Hybrid Search
Combine vector search with keyword matching:
{
"settings": {
"retrieval": {
"hybrid_search": true,
"keyword_weight": 0.3
}
}
}Response Modes
| Mode | Description |
|---|---|
fast | Quick response, fewer citations |
balanced | Default balanced approach |
thorough | Deep search, more citations |
curl -X POST "/api/chat" \
-d '{
"message": "Explain the refund policy in detail",
"knowledge_base_id": "kb-uuid",
"mode": "thorough"
}'Evaluating RAG Quality
Metrics to Track
| Metric | Description |
|---|---|
| Confidence score | How confident the AI is |
| Citation accuracy | Do citations support answers |
| User feedback | Thumbs up/down ratings |
| Response time | How fast responses are |
Improving Quality
- Add more documents - Better coverage
- Tune retrieval - Adjust top_k and threshold
- Enable reranking - Improve relevance
- Review feedback - Learn from user signals
Example: Legal Document RAG
Setup
# Create knowledge base for contracts
curl -X POST "/api/knowledge-bases" \
-d '{"name": "Contract Library", "description": "Client contracts"}'
# Upload contracts
for file in contracts/*.pdf; do
curl -X POST "/api/upload" \
-F "file=@$file" \
-F "knowledge_base_id=kb-uuid"
doneQuery
curl -X POST "/api/chat" \
-d '{
"message": "What are the standard payment terms across our contracts?",
"knowledge_base_id": "kb-uuid",
"mode": "thorough"
}'Response
{
"message": "Based on the contracts in your library, standard payment terms are Net 30 days. However, there are variations...",
"citations": [
{
"text": "Payment shall be due within thirty (30) days...",
"document_name": "Acme Contract.pdf",
"page": 5
}
],
"confidence_score": 0.92
}Troubleshooting
Low Confidence Scores
- Add more relevant documents
- Check document quality
- Adjust retrieval threshold
Irrelevant Results
- Increase relevance threshold
- Enable reranking
- Review chunking settings
Missing Information
- Ensure documents are fully processed
- Check if content is in the knowledge base
- Verify document parsing succeeded
Last updated on