Blog
How we're building DocLD, why we're here, and where we're headed.
DocLD on FUNSD: Form Understanding Performance Analysis
We ran DocLD's document parsing on the official FUNSD testing set — 50 noisy scanned form images with ground-truth annotations. Here's a performance analysis with visualizations and insights into OCR quality on real-world forms.
Build vs. Buy for Document Processing: Choosing the Right Approach
A practical framework for deciding when to build document processing in-house versus using an API or platform — volume, complexity, compliance, and total cost.
DocLD-FinTabNet: Leading Table Extraction on Financial Documents
We benchmarked DocLD's table extraction on FinTabNet — 500 financial tables from S&P 500 SEC filings — scoring 82.1% accuracy with zero failures and outperforming GTE (IBM) and TATR (Microsoft). Here's what we found.
DocLD-TableBench: How We Stack Up Against the Best in Table Extraction
We ran DocLD against Reducto's open RD-TableBench dataset — 1,000 PhD-annotated complex tables — and compared accuracy with Reducto, Azure, Textract, GPT-4o, and more. Here's what we found.
Extract Like a Pro — How DocLD Handles Your Messiest Documents
DocLD's intelligent extraction doesn't just read documents—it understands them. See how we turn complex layouts, tables, and multi-page forms into clean, structured data.
Structured Extraction in DocLD — Schemas, Jobs, and Corrections
How DocLD turns documents into structured data: defining schemas, running extraction jobs, and fixing results with the correction UI and API.
Everything You Need to Know About PDFs
A technical deep dive into the Portable Document Format: specification, file structure, object model, text vs. images, fonts and encoding, parsing strategies, security, linearization, accessibility, and the tooling ecosystem.
Building DocLD — How We're Building It, Why We're Here, and Where We Stand
Our approach to building DocLD as an end-to-end document intelligence platform, why we're solving this problem, and how we're positioning against the competition.
How RAG Works in DocLD — Retrieval, Reranking, and Citations Under the Hood
A technical deep dive into DocLD's RAG pipeline: how Pinecone integrated embeddings power search, rerank chunks, and generate cited answers from your knowledge bases.