Structured Data
ConceptsStructured data is data that follows a fixed schema: named fields, types (string, number, date, array, object), and often validation rules. Extraction in DocLD turns unstructured document content into structured data (e.g., JSON) that matches a schema. Structured data can be stored in databases, sent to APIs, or exported (e.g., CSV export).
vs Unstructured Data
| Type | Example |
|---|---|
| Unstructured | Raw PDF text, images, free-form paragraphs |
| Structured | { "invoice_number": "INV-001", "total": 99.99, "date": "2024-01-15" } |
JSON schema defines the shape of structured output for extraction. Field mapping and instructions control how document content becomes structured fields. Ground truth is used to measure extraction accuracy on structured data.
Related Concepts
Structured data is the output of extraction and schema. Unstructured data is the input (documents). JSON schema and field mapping define the structure.