Ground Truth
ConceptsGround truth is verified, human-validated data that represents the correct values for a document or set of documents. In DocLD, ground truth is used to measure extraction accuracy: you set verified field values for sample documents, then compare AI extraction results against them to track quality and tune schemas.
Why Ground Truth Matters
Without ground truth, you cannot objectively measure extraction accuracy. With it, you can:
- Benchmark — Compare different schemas or model versions
- Monitor — Detect regressions when extraction quality drops
- Improve — Use feedback to refine field definitions and instructions
- Report — Share accuracy metrics with stakeholders
How Ground Truth Works in DocLD
- Create ground truth — For sample documents, manually set the correct values for each schema field
- Run extraction — Execute extraction on the same documents
- Compare — DocLD computes accuracy (e.g., exact match, fuzzy match) per field and overall
- Iterate — Adjust schema instructions or field types based on errors
Ground truth can be set via the dashboard or API. Corrections made during review can optionally be promoted to ground truth for future runs.
Best Practices
- Representative samples — Include documents that cover edge cases (missing fields, alternate formats)
- Consistent rules — Define how to handle nulls, rounding, and formatting before labeling
- Update regularly — Add new ground truth as document types or schemas evolve
- Use with confidence scores — Low-confidence extractions are good candidates for manual review and ground truth creation
Related Concepts
Ground truth feeds into extraction quality measurement. Confidence scores indicate per-field reliability. Prebuilt schemas can be validated against ground truth before customizing.