Blog
How we're building DocLD, why we're here, and where we're headed.
DocLD-TableBench: How We Stack Up Against the Best in Table Extraction
We ran DocLD against Reducto's open RD-TableBench dataset — 1,000 PhD-annotated complex tables — and compared accuracy with Reducto, Azure, Textract, GPT-4o, and more. Here's what we found.
Extract Like a Pro — How DocLD Handles Your Messiest Documents
DocLD's intelligent extraction doesn't just read documents—it understands them. See how we turn complex layouts, tables, and multi-page forms into clean, structured data.
Structured Extraction in DocLD — Schemas, Jobs, and Corrections
How DocLD turns documents into structured data: defining schemas, running extraction jobs, and fixing results with the correction UI and API.
Everything You Need to Know About PDFs
A technical deep dive into the Portable Document Format: specification, file structure, object model, text vs. images, fonts and encoding, parsing strategies, security, linearization, accessibility, and the tooling ecosystem.
Building DocLD — How We're Building It, Why We're Here, and Where We Stand
Our approach to building DocLD as an end-to-end document intelligence platform, why we're solving this problem, and how we're positioning against the competition.
How RAG Works in DocLD — Retrieval, Reranking, and Citations Under the Hood
A technical deep dive into DocLD's RAG pipeline: how Pinecone integrated embeddings power search, rerank chunks, and generate cited answers from your knowledge bases.