Document Comparison
Compare documents side-by-side to find differences in extracted data.
What is Comparison?
Document comparison helps you:
- Find differences between document versions
- Compare extracted data across documents
- Identify discrepancies in batch extractions
- Validate data consistency
Use cases:
- Compare contract versions
- Verify invoice consistency
- Review document changes
- Quality control for extractions
Comparing Documents
From the Dashboard
- Go to Compare in the sidebar
- Select two documents to compare
- Choose an extraction schema (optional)
- Click Compare
- Review differences
Visual Comparison
The comparison view shows:
- Side-by-side document preview
- Field-by-field differences
- Highlighted changes
- Confidence indicators
Comparison Modes
Full Document Comparison
Compare all content between documents:
- Text differences
- Layout changes
- Missing/added sections
Extraction Comparison
Compare extracted field values:
{
"document_ids": ["doc1", "doc2"],
"schema_id": "invoice-schema"
}Returns field-by-field comparison:
{
"fields": [
{
"field": "invoice_number",
"doc1_value": "INV-001",
"doc2_value": "INV-001",
"match": true
},
{
"field": "total_amount",
"doc1_value": 1250.00,
"doc2_value": 1300.00,
"match": false,
"difference": 50.00
}
],
"summary": {
"total_fields": 10,
"matching": 8,
"different": 2
}
}Batch Comparison
Compare multiple documents at once:
Running Batch Comparison
- Create a batch extraction
- View batch results
- Click Compare Results
Batch Comparison View
Shows:
- Field values across all documents
- Outliers and anomalies
- Statistical summaries
- Export options
API Access
curl -X GET "/api/extract/batch/{id}/comparison" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"only_discrepancies": true,
"min_variance": 0.1
}'Comparison Statistics
Field Statistics
For each compared field:
| Metric | Description |
|---|---|
| Match rate | Percentage of matching values |
| Variance | Standard deviation (numeric) |
| Outliers | Values outside normal range |
Summary Statistics
{
"statistics": {
"documents_compared": 50,
"fields_compared": 10,
"overall_match_rate": 0.94,
"discrepancy_count": 30,
"numeric_variance": {
"total_amount": 125.50,
"quantity": 2.3
}
}
}Filtering Comparison Results
By Discrepancy
Show only fields with differences:
{
"only_discrepancies": true
}By Variance Threshold
Show fields with variance above threshold:
{
"min_variance": 0.05
}By Specific Fields
Compare only selected fields:
{
"fields": ["invoice_number", "total_amount", "vendor_name"]
}Export Comparison
Export comparison results:
Formats
| Format | Description |
|---|---|
| JSON | Raw comparison data |
| CSV | Spreadsheet-compatible |
Export via API
curl -X GET "/api/extract/batch/{id}/comparison?format=csv" \
-H "Authorization: Bearer YOUR_API_KEY" \
-o comparison.csvComparison Summary
Generate AI-powered comparison summaries:
curl -X POST "/api/comparison/summary" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"comparison_data": {...},
"document_names": ["Contract v1", "Contract v2"]
}'Returns natural language summary:
The two contracts differ in 3 key areas:
1. Payment terms changed from Net 30 to Net 45
2. Liability cap increased from $100,000 to $150,000
3. Termination notice period extended to 60 daysBest Practices
- Use schemas - Compare structured data for accuracy
- Check outliers - Review statistical outliers
- Verify differences - Confirm important discrepancies
- Export for review - Share comparison reports
- Batch for efficiency - Compare multiple documents at once
API Reference
See the Documents API for comparison endpoints.
Last updated on