Zero-Shot Extraction
AIZero-shot extraction is extracting structured data from documents without training on your specific documents or document types. DocLD uses zero-shot extraction: you define a schema with field names, types, and instructions, and the LLM extracts values directly. No fine-tuning or labeled data is required.
How Zero-Shot Extraction Works
- Schema — Define fields, types, and instructions in a schema
- Document — Send the document to the extraction API
- Extract — The LLM reads the document and extracts values matching the schema
- Return — Values, confidence scores, and citations are returned
The model uses its general understanding of documents plus your schema instructions. Prebuilt schemas for common types (Invoice, Contract, Resume) provide a head start; customize instructions for your use case.
Benefits of Zero-Shot
| Benefit | Description |
|---|---|
| No training | Start extracting immediately without labeled data |
| Flexible | Change schemas or instructions without retraining |
| Broad coverage | Works across document types with appropriate schemas |
| Fast iteration | Adjust instructions and re-run extraction quickly |
Ground truth can improve quality over time by identifying systematic errors and refining instructions—still without model training.
Best Practices
- Clear instructions — Schema instructions guide the LLM; be specific about edge cases
- Use prebuilt schemas — Start with prebuilt schemas for common types
- Validate results — Use confidence scores and citations to flag uncertain extractions
- Iterate — Refine schema instructions based on extraction quality
Related Concepts
Zero-shot extraction is the default extraction mode in DocLD. Schema and instructions drive behavior. Prebuilt schemas accelerate setup. Ground truth measures accuracy without training.