Overview — What is DocLD?
Transform unstructured documents into structured data
DocLD is a document intelligence platform that converts PDFs, images, spreadsheets, presentations, and supported formats into structured data.
What you can do:
- Parse documents — Extract text, tables, and layout with OCR and semantic chunking
- Extract data — Pull specific fields using AI-powered schemas
- Chat with documents — RAG-powered Q&A with citations
- Automate workflows — Build document processing pipelines
- Generate documents — Create new documents from templates and data
How it works
Upload → Parse → Chunk → Vectorize → Index → Query- Upload — Send documents via API, dashboard, or CLI
- Process — DocLD parses, runs OCR, chunks semantically, and vectorizes (Pinecone llama-text-embed-v2)
- Use — Extract data, chat with documents, or automate workflows
Features
| Feature | Description |
|---|---|
| Parsing | Text, tables, figures with OCR for 50+ languages |
| Extraction | Schema-based data extraction with confidence scores |
| Chat | RAG-powered Q&A with citations |
| Knowledge Bases | Organize documents for semantic search |
| Workflows | Automate processing with triggers and integrations |
| Analytics | Track usage, quality, and costs |
| Generation | Generate documents from data |
| Split | Split documents into sections |
| Edit | Fill forms with natural language |
| Comparison | Compare documents and extractions |
Getting Started
| Option | Description |
|---|---|
| Dashboard | Visual interface for document processing |
| API | REST API for automation (quickstart) |
| SDKs | JavaScript and Python SDKs (SDKs) |
| CLI | Command-line tool (overview) |
| MCP & IDE | Use DocLD docs inside Cursor, Claude, and other IDEs (MCP & IDE) |
| Claude Code Plugin | DocLD skill and docs search in Claude Code (Claude Code Plugin) |
API Overview
| Category | Endpoints |
|---|---|
| Parse | /api/parse, /api/upload |
| Extract | /api/extract/run, /api/extract/schemas |
| Chat | /api/chat, /api/chat/sessions |
| Knowledge Bases | /api/knowledge-bases |
| Workflows | /api/workflows |
| Full Reference | API Docs |
Use Cases
- Financial Services — Invoice processing, bank statements, contract analysis
- Insurance — Claims extraction, policy analysis, underwriting
- Healthcare — Lab reports, medical records, patient data
- Legal — Contract review, clause extraction, due diligence
- HR — Resume parsing, document routing
Security & Compliance
| Feature | Description |
|---|---|
| GDPR | Data export, deletion, consent management |
| HIPAA | PHI auditing, access controls |
| ISO 27001 | Security event logging, incident management |
| Encryption | AES-256 at rest, TLS 1.3 in transit |
| Access Control | API keys, roles, organizations |
Quick Links
- API Quickstart — Parse your first document
- SDKs — JavaScript and Python SDKs
- Use DocLD docs in your IDE (MCP) — MCP setup for Cursor, Claude, and more
- RAG Setup Guide — Build a chat pipeline
- Custom Extraction — Create extraction schemas
- Best Practices — Tips and recommendations
Last updated on