Overview — What is DocLD?
Transform unstructured documents into structured data
DocLD is a document intelligence platform that converts PDFs, images, spreadsheets, presentations, and supported formats into structured data.
What you can do:
- Parse documents — Extract text, tables, and layout with OCR and semantic chunking
- Extract data — Pull specific fields using AI-powered schemas
- Chat with documents — RAG-powered Q&A with citations
- Automate workflows — Build document processing pipelines
- Generate documents — Create new documents from templates and data
How it works
Upload → Parse → Chunk → Vectorize → Index → Query- Upload — Send documents via API, dashboard, or CLI
- Process — DocLD parses, runs OCR, chunks semantically, and vectorizes (vector database, llama-text-embed-v2)
- Use — Extract data, chat with documents, or automate workflows
Features
| Feature | Description |
|---|---|
| Parsing | Text, tables, figures with OCR for 50+ languages |
| Extraction | Schema-based data extraction with confidence scores |
| Chat | RAG-powered Q&A with citations |
| Knowledge Bases | Organize documents for semantic search |
| Workflows | Automate processing with triggers and integrations |
| Analytics | Track usage, quality, and costs |
| Generation | Generate documents from data |
| Split | Split documents into sections |
| Edit | Fill forms with natural language |
| Comparison | Compare documents and extractions |
Getting Started
| Option | Description |
|---|---|
| Dashboard | Visual interface for document processing |
| API | REST API for automation (quickstart) |
| SDKs | JavaScript and Python SDKs (SDKs) |
| CLI | Command-line tool (overview) |
| MCP & IDE | Use DocLD docs inside Cursor, Claude, and other IDEs (MCP & IDE) |
| Claude Code Plugin | DocLD skill and docs search in Claude Code (Claude Code Plugin) |
| Slack | Slash commands, @mention RAG, and file upload in Slack (Slack) |
API Overview
| Category | Endpoints |
|---|---|
| Documents | POST/GET/DELETE /v1/documents, GET /v1/documents/:id/content |
| Collections | POST/GET /v1/collections, POST /v1/collections/:id/documents |
| Schemas | POST/GET /v1/schemas, GET/POST/DELETE /v1/schemas/:id |
| Extractions | POST /v1/extractions, GET /v1/extractions/:id |
| Chat | POST /v1/chat/completions |
| Search | POST /v1/search |
| Jobs | GET /v1/jobs/:id (processing status) |
| Full Reference | API Docs |
Use Cases
- Financial Services — Invoice processing, bank statements, contract analysis
- Insurance — Claims extraction, policy analysis, underwriting
- Healthcare — Lab reports, medical records, patient data
- Legal — Contract review, clause extraction, due diligence
- HR — Resume parsing, document routing
Security & Compliance
| Feature | Description |
|---|---|
| GDPR | Data export, deletion, consent management |
| HIPAA | PHI auditing, access controls |
| ISO 27001 | Security event logging, incident management |
| Encryption | AES-256 at rest, TLS 1.3 in transit |
| Access Control | API keys, roles, organizations |
Quick Links
- API Quickstart — Parse your first document
- SDKs — JavaScript and Python SDKs
- Use DocLD docs in your IDE (MCP) — MCP setup for Cursor, Claude, and more
- RAG Setup Guide — Build a chat pipeline
- Custom Extraction — Create extraction schemas
- Best Practices — Tips and recommendations
Last updated on