RAG Setup Guide

Build a Retrieval-Augmented Generation (RAG) pipeline with DocLD to chat with your documents.

What is RAG?

RAG combines document retrieval with AI generation:

Retrieve - Find relevant document chunks
Augment - Add context to the AI prompt
Generate - Create answers using the context

This produces accurate, citation-backed responses from your documents.

Quick Start

1. Create a Knowledge Base


curl -X POST "/api/knowledge-bases" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Company Policies",
    "description": "HR policies and procedures"
  }'

2. Add Documents

Upload documents to your knowledge base:


curl -X POST "/api/upload" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@employee-handbook.pdf" \
  -F "knowledge_base_id=kb-uuid"

Or add existing documents:


curl -X POST "/api/knowledge-bases/{kb_id}/documents" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"document_id": "doc-uuid"}'

3. Chat with Documents

The chat API is stream-only and uses the AI SDK UIMessage contract. Create a session first, then send messages with message (UIMessage with role and parts), session_id, and knowledge_base_id. See the Chat API for full details.

Create a session:


curl -X POST "/api/chat/sessions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"knowledge_base_id": "kb-uuid"}'

Send a message (use the session_id from the response):


curl -X POST "/api/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": { "role": "user", "parts": [{ "type": "text", "text": "What is the vacation policy?" }] },
    "session_id": "session-uuid",
    "knowledge_base_id": "kb-uuid"
  }'

How RAG Works in DocLD

Document Processing Pipeline


Upload → Parse → Chunk → Vectorize → Index

Parse - Extract text, tables, figures from documents
Chunk - Split content into semantic units
Vectorize - Chunk text is sent to Pinecone; embeddings are generated server-side (llama-text-embed-v2)
Index - Records stored in Pinecone for semantic search

Query Pipeline


Question → Search (Pinecone embeds) → Retrieve → Generate → Respond

Search - Query text is sent to Pinecone; embeddings and similarity search happen server-side
Retrieve - Get top-k relevant chunks
Generate - AI creates answer using context
Respond - Return answer with citations

Configuring RAG

Retrieval Settings

Configure how many results to retrieve:


{
  "settings": {
    "retrieval": {
      "top_k": 5,
      "threshold": 0.7,
      "reranking": true
    }
  }
}

Setting	Default	Description
`top_k`	5	Number of chunks to retrieve
`threshold`	0.7	Minimum relevance score (0-1)
`reranking`	true	Re-rank results for accuracy

Chunking Strategy

Control how documents are split:


{
  "settings": {
    "chunking": {
      "strategy": "semantic",
      "max_size": 1000,
      "overlap": 100
    }
  }
}

Strategy	Description
`semantic`	Split by meaning (recommended)
`fixed`	Fixed character count
`page`	Split by page

Best Practices

Document Preparation

Quality over quantity - Curate relevant content
Consistent formatting - Well-structured documents work better
Remove noise - Exclude irrelevant sections
Update regularly - Keep content current

Knowledge Base Organization

Single topic - One domain per knowledge base
Appropriate size - Not too small, not too large
Related content - Documents should relate to each other

Query Optimization

Be specific - Clear questions get better answers
Context helps - Provide background when needed
Follow up - Use conversation context

Advanced Configuration

Hybrid Search

Combine vector search with keyword matching:


{
  "settings": {
    "retrieval": {
      "hybrid_search": true,
      "keyword_weight": 0.3
    }
  }
}

Response Modes

Mode	Description
`fast`	Quick response, fewer citations
`balanced`	Default balanced approach
`thorough`	Deep search, more citations


curl -X POST "/api/chat" \
  -d '{
    "message": "Explain the refund policy in detail",
    "knowledge_base_id": "kb-uuid",
    "mode": "thorough"
  }'

Evaluating RAG Quality

Metrics to Track

Metric	Description
Confidence score	How confident the AI is
Citation accuracy	Do citations support answers
User feedback	Thumbs up/down ratings
Response time	How fast responses are

Improving Quality

Add more documents - Better coverage
Tune retrieval - Adjust top_k and threshold
Enable reranking - Improve relevance
Review feedback - Learn from user signals

Example: Legal Document RAG

Setup


# Create knowledge base for contracts
curl -X POST "/api/knowledge-bases" \
  -d '{"name": "Contract Library", "description": "Client contracts"}'
 
# Upload contracts
for file in contracts/*.pdf; do
  curl -X POST "/api/upload" \
    -F "file=@$file" \
    -F "knowledge_base_id=kb-uuid"
done

Query


curl -X POST "/api/chat" \
  -d '{
    "message": "What are the standard payment terms across our contracts?",
    "knowledge_base_id": "kb-uuid",
    "mode": "thorough"
  }'

Response


{
  "message": "Based on the contracts in your library, standard payment terms are Net 30 days. However, there are variations...",
  "citations": [
    {
      "text": "Payment shall be due within thirty (30) days...",
      "document_name": "Acme Contract.pdf",
      "page": 5
    }
  ],
  "confidence_score": 0.92
}

Troubleshooting

Low Confidence Scores

Add more relevant documents
Check document quality
Adjust retrieval threshold

Irrelevant Results

Increase relevance threshold
Enable reranking
Review chunking settings

Missing Information

Ensure documents are fully processed
Check if content is in the knowledge base
Verify document parsing succeeded