Ground Truth | Glossary | DocLD

Ground truth is verified, human-validated data that represents the correct values for a document or set of documents. In DocLD, ground truth is used to measure extraction accuracy: you set verified field values for sample documents, then compare AI extraction results against them to track quality and tune schemas.

Why Ground Truth Matters

Without ground truth, you cannot objectively measure extraction accuracy. With it, you can:

Benchmark — Compare different schemas or model versions
Monitor — Detect regressions when extraction quality drops
Improve — Use feedback to refine field definitions and instructions
Report — Share accuracy metrics with stakeholders

How Ground Truth Works in DocLD

Create ground truth — For sample documents, manually set the correct values for each schema field
Run extraction — Execute extraction on the same documents
Compare — DocLD computes accuracy (e.g., exact match, fuzzy match) per field and overall
Iterate — Adjust schema instructions or field types based on errors

Ground truth can be set via the dashboard or API. Corrections made during review can optionally be promoted to ground truth for future runs.

Best Practices

Representative samples — Include documents that cover edge cases (missing fields, alternate formats)
Consistent rules — Define how to handle nulls, rounding, and formatting before labeling
Update regularly — Add new ground truth as document types or schemas evolve
Use with confidence scores — Low-confidence extractions are good candidates for manual review and ground truth creation

Ground truth feeds into extraction quality measurement. Confidence scores indicate per-field reliability. Prebuilt schemas can be validated against ground truth before customizing.

Frequently Asked Questions

Why Ground Truth Matters

Without ground truth, you cannot objectively measure extraction accuracy. With it, you can:

Benchmark — Compare different schemas or model versions

Monitor — Detect regressions when extraction quality drops

Improve — Use feedback to refine field definitions and instructions

Report — Share accuracy metrics with stakeholders

How Ground Truth Works in DocLD

Create ground truth — For sample documents, manually set the correct values for each schema field

Run extraction — Execute extraction on the same documents

Compare — DocLD computes accuracy (e.g., exact match, fuzzy match) per field and overall

Iterate — Adjust schema instructions or field types based on errors

Ground truth can be set via the dashboard or API. Corrections made during review can optionally be promoted to ground truth for future runs.

Best Practices

Representative samples — Include documents that cover edge cases (missing fields, alternate formats)

Consistent rules — Define how to handle nulls, rounding, and formatting before labeling

Update regularly — Add new ground truth as document types or schemas evolve

Use with confidence scores — Low-confidence extractions are good candidates for manual review and ground truth creation

Frequently Asked Questions

Why Ground Truth Matters

How Ground Truth Works in DocLD

Best Practices

Related Concepts

Frequently Asked Questions

Why Ground Truth Matters

How Ground Truth Works in DocLD

Best Practices

Related Concepts

Frequently Asked Questions