Extraction | Glossary | DocLD

Extraction pulls structured data from documents using AI and a schema. Define the fields you need (e.g., invoice number, total amount, line items), and DocLD extracts values with confidence scores and citations. DocLD uses zero-shot extraction: no training on your documents is required.

How Extraction Works

Document + Schema → AI Extraction → Field Results + Confidence + Citations

Schema — Define fields, types, and instructions in a schema
Document — Send document (or document ID) to the extraction API
Extract — LLM identifies and extracts values matching the schema
Validate — Confidence scores indicate reliability per field
Return — Field values, confidence, and citations are returned
Correct — Manually fix low-confidence or incorrect values; optionally promote to ground truth

Extraction runs asynchronously for large documents; use jobs and webhooks to track completion.

Pipeline Diagram

Stage	Input	Output
Parse	Document file	Text, tables, layout
Extract	Text + Schema	Field values
Score	Extracted values	Confidence scores
Cite	Source passages	Citations per field

Parsing runs first (or uses existing parsed content). Extraction consumes the parsed text and applies the schema.

Output

Component	Description
Field values	Extracted data in your schema structure
Confidence	Per-field and overall scores (0–1)
Citations	Where each value was found (text, page, coordinates)

Use confidence scores to prioritize review. Low-confidence fields should be verified against the citation.

Retries and Error Handling

Retries — Failed extractions can be retried; transient errors (rate limits, timeouts) may succeed on retry
Partial results — Some fields may extract successfully while others fail; results include per-field status
Error details — API returns error messages for failed extractions; inspect schema and document for issues

For batch processing, handle partial success: some documents may extract fully while others fail or return low confidence.

Prebuilt Schemas

DocLD offers prebuilt schemas for common document types: Invoice, Receipt, Contract, NDA, Resume, Bank Statement, and more. Use them directly or as templates for custom schemas. Form detection can suggest a prebuilt schema for mixed document batches.

Ground Truth

Set verified values to measure extraction accuracy and compare runs. Ground truth enables quality analytics and schema tuning. Corrections made during review can be promoted to ground truth for future accuracy measurement.

Extraction uses schemas and produces confidence scores and citations. Prebuilt schemas accelerate setup. Zero-shot extraction requires no training. Jobs and webhooks track async extraction.

Frequently Asked Questions

How Extraction Works

Document + Schema → AI Extraction → Field Results + Confidence + Citations

Schema — Define fields, types, and instructions in a schema

Document — Send document (or document ID) to the extraction API

Extract — LLM identifies and extracts values matching the schema

Validate — Confidence scores indicate reliability per field

Return — Field values, confidence, and citations are returned

Correct — Manually fix low-confidence or incorrect values; optionally promote to ground truth

Extraction runs asynchronously for large documents; use jobs and webhooks to track completion.

Stage

Input

Output

Parse

Document file

Text, tables, layout

Extract

Text + Schema

Field values

Score

Extracted values

Confidence scores

Cite

Source passages

Citations per field

Component

Description

Field values

Extracted data in your schema structure

Confidence

Per-field and overall scores (0–1)

Citations

Where each value was found (text, page, coordinates)

Retries and Error Handling

Retries — Failed extractions can be retried; transient errors (rate limits, timeouts) may succeed on retry

Partial results — Some fields may extract successfully while others fail; results include per-field status

Error details — API returns error messages for failed extractions; inspect schema and document for issues

For batch processing, handle partial success: some documents may extract fully while others fail or return low confidence.

Frequently Asked Questions

How Extraction Works

Pipeline Diagram

Output

Retries and Error Handling

Prebuilt Schemas

Ground Truth

Related Concepts

Frequently Asked Questions

How Extraction Works

Pipeline Diagram

Output

Retries and Error Handling

Prebuilt Schemas

Ground Truth

Related Concepts

Frequently Asked Questions