Prebuilt Schema
ConceptsPrebuilt schemas are ready-to-use schema definitions for common document types. They define fields and instructions so you can start extraction without building a schema from scratch. Use them as-is or customize (add fields, modify instructions) for your use case.
Available Prebuilt Schemas
| Category | Schemas |
|---|---|
| Invoice | Invoice, Receipt, Purchase Order |
| Contract | Contract, NDA, Agreement |
| Resume | Resume, CV |
| Financial | Bank Statement, Tax Form |
| Form | W-9, W-4, 1099 |
Each prebuilt schema includes field definitions (names, types, required flags) and instructions that guide the LLM for zero-shot extraction.
Using Prebuilt Schemas
Fetch and apply via API:
# List prebuilt schemas by category curl -X GET "/api/extract/schemas/prebuilt?category=invoice" # Use a prebuilt schema for extraction POST /api/extract { "document_id": "...", "schema_id": "prebuilt:invoice" }
Use a prebuilt schema as-is for common document types, or copy and customize for your specific needs. Customization includes adding fields, modifying instructions, or changing field types.
Form Detection
DocLD can auto-detect document type and suggest a prebuilt schema:
- Process document — Document is classified during pipeline processing, or on first request to the suggested-schema endpoint
- Store — Classification is stored in document metadata
- Recommend —
GET /api/extract/suggested-schema?document_id=...returns the suggested schema (e.g., "Invoice", "Contract") - Apply — Use the recommended schema for extraction
Suggested schema accelerates setup when you have mixed document types. Use it for batch processing to route documents to the right schema.
Customization Flow
| Step | Description |
|---|---|
| Fetch | Retrieve a prebuilt schema via API |
| Copy | Create a copy for customization |
| Modify | Add fields, change instructions, adjust types |
| Save | Save as a custom schema |
| Use | Use the custom schema for extraction |
Instructions help the AI handle edge cases (e.g., "If tax is shown separately, extract as its own field"). Refine instructions based on extraction quality and ground truth feedback.
Best Practices
- Start with prebuilt — Use prebuilt schemas for common types before building custom
- Customize for edge cases — Add instructions for document variants (e.g., different invoice formats)
- Validate with ground truth — Use ground truth to measure accuracy of prebuilt vs custom schemas
Related Concepts
Prebuilt schemas are ready-made schemas for extraction. Zero-shot extraction works with prebuilt schemas without training. Ground truth measures extraction accuracy.