Parse API
Parse documents to extract text, tables, and structured content. Supports synchronous and asynchronous processing. For upload-then-parse workflows, see the Upload API and Documents API. For supported formats and OCR, see Document Parsing. For a single request/response without streaming, use the Embed API with action: 'parse'.
Parse Document (Synchronous)
POST /api/parseParse a document and return structured content immediately. Best for smaller documents or when you need results right away.
Input Options
The parse endpoint accepts multiple input formats:
| Input Type | Format | Description |
|---|---|---|
| File upload | multipart/form-data | Upload file directly |
| URL | {"input": "https://..."} | Fetch from URL |
| DocLD reference | {"input": "docld://..."} | Parse previously uploaded document |
Request (File Upload)
Content-Type: multipart/form-data
| Field | Type | Description |
|---|---|---|
file | File | Document file (max 100MB) |
config | JSON string | Parsing configuration |
Request (JSON)
Content-Type: application/json
{
"input": "https://example.com/document.pdf",
"config": {
"formatting": {
"table_output_format": "markdown"
}
}
}Or with a DocLD reference:
{
"input": "docld://abc123-def456"
}Configuration Options
{
"formatting": {
"table_output_format": "markdown"
},
"chunking": {
"strategy": "semantic",
"max_chunk_size": 1000,
"overlap": 100
}
}| Option | Type | Default | Description |
|---|---|---|---|
formatting.table_output_format | string | markdown | Table format: markdown, html, json |
chunking.strategy | string | semantic | Chunking: semantic, fixed, page |
chunking.max_chunk_size | number | 1000 | Maximum chunk size in characters |
chunking.overlap | number | 100 | Overlap between chunks |
Response
{
"job_id": "uuid",
"duration": 2.5,
"usage": {
"num_pages": 5,
"credits": 7.5
},
"result": {
"type": "full",
"chunks": [
{
"content": "# Invoice\\n\\nInvoice Number: INV-001...",
"page": 1,
"metadata": {
"type": "text",
"confidence": 0.98
}
},
{
"content": "| Item | Qty | Price |\\n|------|-----|-------|\\n| Widget | 10 | $5.00 |",
"page": 2,
"metadata": {
"type": "table",
"confidence": 0.95
}
}
]
},
"studio_link": "https://your-domain.com/documents/abc123"
}Example
Parse file:
curl -X POST "https://your-domain.com/api/parse" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@document.pdf"Parse URL:
curl -X POST "https://your-domain.com/api/parse" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "https://example.com/document.pdf"}'Parse Document (Asynchronous)
POST /api/parse/asyncQueue a document for background parsing. Returns immediately with a job ID. Best for large documents or batch processing.
Request (JSON)
{
"input": "https://example.com/large-document.pdf",
"config": {},
"webhook_url": "https://your-server.com/webhook"
}| Field | Type | Description |
|---|---|---|
input | string | URL or docld:// reference |
config | object | Parsing configuration |
webhook_url | string | URL to receive completion callback |
Response
{
"job_id": "uuid",
"status": "pending",
"message": "Document queued for parsing",
"status_url": "https://your-domain.com/api/jobs/uuid",
"webhook_url": "https://your-server.com/webhook"
}Checking Job Status
curl -X GET "https://your-domain.com/api/jobs/{job_id}" \
-H "Authorization: Bearer YOUR_API_KEY"Webhook Callback
When processing completes, a POST request is sent to your webhook URL. See Webhooks for payload details and other webhook-emitting endpoints.
{
"job_id": "uuid",
"status": "completed",
"result": {
"type": "full",
"chunks": [...]
},
"usage": {
"num_pages": 10,
"credits": 15
}
}Parse Presets
Save and reuse parsing configurations.
List Presets
GET /api/parse/presetsQuery Parameters:
| Parameter | Default | Description |
|---|---|---|
scope | all | user, organization, or all |
organization_id | - | Filter by organization (when scope includes organization) |
Response: { "presets": [ { "id", "name", "description", "config", "scope", "organization_id", "created_at", "updated_at" } ] }.
Create Preset
POST /api/parse/presetsRequest body:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Preset name. |
description | string | No | Optional description. |
config | object | No | Parsing config (merged with defaults, validated). |
scope | string | No | user or organization. Default user. |
organization_id | string | No | Required when scope is organization. |
Response: { "preset": { "id", "name", "description", "config", "scope", "organization_id", "created_at", "updated_at" } }.
Get Preset
GET /api/parse/presets/{id}Response: { "preset": { ... } }.
Update Preset
PATCH /api/parse/presets/{id}Request body: name, description, and/or config (partial update). Config is merged with defaults and validated.
Response: { "preset": { ... } }.
Delete Preset
DELETE /api/parse/presets/{id}Response: { "success": true }.
PDF to Text (Public)
POST /api/pdf-to-textSimple PDF to text conversion. Rate-limited, supports anonymous access with restrictions.
Request
Content-Type: multipart/form-data
| Field | Type | Description |
|---|---|---|
file | File | PDF file only |
Response
{
"text": "Extracted text content...",
"pageCount": 10,
"pagesExtracted": 2,
"truncated": true,
"fileName": "document.pdf"
}Limits
| Tier | Max Pages |
|---|---|
| Anonymous | 2 pages |
| Authenticated | Full document |
PDF to Markdown (Public)
POST /api/pdf-to-markdownConvert PDF to structured Markdown (headings, lists, key-values). Same pipeline as PDF to text; output uses Markdown syntax. Ideal for docs, CMSs, and API pipelines.
Request
Content-Type: multipart/form-data
| Field | Type | Description |
|---|---|---|
file | File | PDF file only |
Response
{
"markdown": "# Title\n\n## Section\n\n- List item...",
"pageCount": 10,
"pagesExtracted": 2,
"truncated": true,
"fileName": "document.pdf"
}Limits
| Tier | Max Pages |
|---|---|
| Anonymous | 2 pages |
| Authenticated | Full document |
PDF to JSON (Public)
POST /api/pdf-to-jsonSame parsing as PDF to text; response is structured JSON (pages, blocks, tables). Built for developers and automation.
Request
Content-Type: multipart/form-data
| Field | Type | Description |
|---|---|---|
file | File | PDF file only |
Response
{
"metadata": {
"numPages": 10,
"title": "Document Title",
"author": "Author",
"creationDate": "D:20240101120000",
"fileName": "document.pdf",
"pagesExtracted": 2,
"truncated": true
},
"pages": [
{
"pageNumber": 1,
"blocks": [
{
"type": "Title",
"content": "Chapter One",
"bbox": { "page": 1, "left": 0.1, "top": 0.05, "width": 0.8, "height": 0.04 },
"confidence": "high",
"metadata": {}
},
{
"type": "Text",
"content": "Paragraph text...",
"bbox": { "page": 1, "left": 0.1, "top": 0.1, "width": 0.8, "height": 0.1 },
"confidence": "high",
"metadata": {}
}
]
}
],
"tables": [
{
"type": "Table",
"content": "| A | B |\n|---|---|\n| 1 | 2 |",
"bbox": { "page": 1, "left": 0.1, "top": 0.2, "width": 0.5, "height": 0.15 },
"confidence": "high",
"metadata": {
"rows": 2,
"columns": 2,
"formats": { "markdown": "...", "html": "...", "json": [] }
}
}
]
}Limits
| Tier | Max Pages |
|---|---|
| Anonymous | 2 pages |
| Authenticated | Full document |
Credit Usage
Parsing costs credits based on page count:
| Operation | Credits per Page |
|---|---|
| Standard parse | 1.5 |
| Agentic OCR | 3.0 |
Error Handling
| Status | Error | Description |
|---|---|---|
| 400 | VALIDATION_ERROR | Invalid input or configuration |
| 404 | NOT_FOUND | Document not found (docld:// reference) |
| 413 | FILE_TOO_LARGE | File exceeds 100MB limit |
| 422 | UNSUPPORTED_FORMAT | File format not supported |
| 500 | PROCESSING_ERROR | Parse failed |
See also: Upload API, Documents, Parsing, Embed API, Webhooks.