Zero-Shot Extraction | Glossary | DocLD

Zero-shot extraction is extracting structured data from documents without training on your specific documents or document types. DocLD uses zero-shot extraction: you define a schema with field names, types, and instructions, and the LLM extracts values directly. No fine-tuning or labeled data is required.

How Zero-Shot Extraction Works

Schema — Define fields, types, and instructions in a schema
Document — Send the document to the extraction API
Extract — The LLM reads the document and extracts values matching the schema
Return — Values, confidence scores, and citations are returned

The model uses its general understanding of documents plus your schema instructions. Prebuilt schemas for common types (Invoice, Contract, Resume) provide a head start; customize instructions for your use case.

Benefits of Zero-Shot

Benefit	Description
No training	Start extracting immediately without labeled data
Flexible	Change schemas or instructions without retraining
Broad coverage	Works across document types with appropriate schemas
Fast iteration	Adjust instructions and re-run extraction quickly

Ground truth can improve quality over time by identifying systematic errors and refining instructions—still without model training.

Best Practices

Clear instructions — Schema instructions guide the LLM; be specific about edge cases
Use prebuilt schemas — Start with prebuilt schemas for common types
Validate results — Use confidence scores and citations to flag uncertain extractions
Iterate — Refine schema instructions based on extraction quality

Zero-shot extraction is the default extraction mode in DocLD. Schema and instructions drive behavior. Prebuilt schemas accelerate setup. Ground truth measures accuracy without training.

Frequently Asked Questions

How Zero-Shot Extraction Works

Schema — Define fields, types, and instructions in a schema

Document — Send the document to the extraction API

Extract — The LLM reads the document and extracts values matching the schema

Return — Values, confidence scores, and citations are returned

Benefits of Zero-Shot

Benefit	Description
No training	Start extracting immediately without labeled data
Flexible	Change schemas or instructions without retraining
Broad coverage	Works across document types with appropriate schemas
Fast iteration	Adjust instructions and re-run extraction quickly

Ground truth can improve quality over time by identifying systematic errors and refining instructions—still without model training.

Best Practices

Clear instructions — Schema instructions guide the LLM; be specific about edge cases

Use prebuilt schemas — Start with prebuilt schemas for common types

Validate results — Use confidence scores and citations to flag uncertain extractions

Iterate — Refine schema instructions based on extraction quality

Frequently Asked Questions

How Zero-Shot Extraction Works

Benefits of Zero-Shot

Best Practices

Related Concepts

Frequently Asked Questions

How Zero-Shot Extraction Works

Benefits of Zero-Shot

Best Practices

Related Concepts

Frequently Asked Questions