Parse API

Parse documents to extract text, tables, and structured content. Supports synchronous and asynchronous processing. For upload-then-parse workflows, see the Upload API and Documents API. For supported formats and OCR, see Document Parsing. For a single request/response without streaming, use the Embed API with action: 'parse'.

Parse Document (Synchronous)


POST /api/parse

Parse a document and return structured content immediately. Best for smaller documents or when you need results right away.

Input Options

The parse endpoint accepts multiple input formats:

Input Type	Format	Description
File upload	`multipart/form-data`	Upload file directly
URL	`{"input": "https://..."}`	Fetch from URL
DocLD reference	`{"input": "docld://..."}`	Parse previously uploaded document

Request (File Upload)

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	Document file (max 100MB)
`config`	JSON string	Parsing configuration

Request (JSON)

Content-Type: application/json


{
  "input": "https://example.com/document.pdf",
  "config": {
    "formatting": {
      "table_output_format": "markdown"
    }
  }
}

Or with a DocLD reference:


{
  "input": "docld://abc123-def456"
}

Configuration Options


{
  "formatting": {
    "table_output_format": "markdown"
  },
  "chunking": {
    "strategy": "semantic",
    "max_chunk_size": 1000,
    "overlap": 100
  }
}

Option	Type	Default	Description
`formatting.table_output_format`	string	`markdown`	Table format: `markdown`, `html`, `json`
`chunking.strategy`	string	`semantic`	Chunking: `semantic`, `fixed`, `page`
`chunking.max_chunk_size`	number	1000	Maximum chunk size in characters
`chunking.overlap`	number	100	Overlap between chunks

Response


{
  "job_id": "uuid",
  "duration": 2.5,
  "usage": {
    "num_pages": 5,
    "credits": 7.5
  },
  "result": {
    "type": "full",
    "chunks": [
      {
        "content": "# Invoice\\n\\nInvoice Number: INV-001...",
        "page": 1,
        "metadata": {
          "type": "text",
          "confidence": 0.98
        }
      },
      {
        "content": "| Item | Qty | Price |\\n|------|-----|-------|\\n| Widget | 10 | $5.00 |",
        "page": 2,
        "metadata": {
          "type": "table",
          "confidence": 0.95
        }
      }
    ]
  },
  "studio_link": "https://your-domain.com/documents/abc123"
}

Example

Parse file:


curl -X POST "https://your-domain.com/api/parse" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf"

Parse URL:


curl -X POST "https://your-domain.com/api/parse" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "https://example.com/document.pdf"}'

Parse Document (Asynchronous)


POST /api/parse/async

Queue a document for background parsing. Returns immediately with a job ID. Best for large documents or batch processing.

Request (JSON)


{
  "input": "https://example.com/large-document.pdf",
  "config": {},
  "webhook_url": "https://your-server.com/webhook"
}

Field	Type	Description
`input`	string	URL or `docld://` reference
`config`	object	Parsing configuration
`webhook_url`	string	URL to receive completion callback

Response


{
  "job_id": "uuid",
  "status": "pending",
  "message": "Document queued for parsing",
  "status_url": "https://your-domain.com/api/jobs/uuid",
  "webhook_url": "https://your-server.com/webhook"
}

Checking Job Status


curl -X GET "https://your-domain.com/api/jobs/{job_id}" \
  -H "Authorization: Bearer YOUR_API_KEY"

Webhook Callback

When processing completes, a POST request is sent to your webhook URL. See Webhooks for payload details and other webhook-emitting endpoints.


{
  "job_id": "uuid",
  "status": "completed",
  "result": {
    "type": "full",
    "chunks": [...]
  },
  "usage": {
    "num_pages": 10,
    "credits": 15
  }
}

Parse Presets

Save and reuse parsing configurations.

List Presets


GET /api/parse/presets

Query Parameters:

Parameter	Default	Description
`scope`	`all`	`user`, `organization`, or `all`
`organization_id`	-	Filter by organization (when scope includes organization)

Response: { "presets": [ { "id", "name", "description", "config", "scope", "organization_id", "created_at", "updated_at" } ] }.

Create Preset


POST /api/parse/presets

Request body:

Field	Type	Required	Description
`name`	string	Yes	Preset name.
`description`	string	No	Optional description.
`config`	object	No	Parsing config (merged with defaults, validated).
`scope`	string	No	`user` or `organization`. Default `user`.
`organization_id`	string	No	Required when `scope` is `organization`.

Response: { "preset": { "id", "name", "description", "config", "scope", "organization_id", "created_at", "updated_at" } }.

Get Preset


GET /api/parse/presets/{id}

Response: { "preset": { ... } }.

Update Preset


PATCH /api/parse/presets/{id}

Request body: name, description, and/or config (partial update). Config is merged with defaults and validated.

Response: { "preset": { ... } }.

Delete Preset


DELETE /api/parse/presets/{id}

Response: { "success": true }.

PDF to Text (Public)


POST /api/pdf-to-text

Simple PDF to text conversion. Rate-limited, supports anonymous access with restrictions.

Request

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	PDF file only

Response


{
  "text": "Extracted text content...",
  "pageCount": 10,
  "pagesExtracted": 2,
  "truncated": true,
  "fileName": "document.pdf"
}

Limits

Tier	Max Pages
Anonymous	2 pages
Authenticated	Full document

PDF to Markdown (Public)


POST /api/pdf-to-markdown

Convert PDF to structured Markdown (headings, lists, key-values). Same pipeline as PDF to text; output uses Markdown syntax. Ideal for docs, CMSs, and API pipelines.

Request

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	PDF file only

Response


{
  "markdown": "# Title\n\n## Section\n\n- List item...",
  "pageCount": 10,
  "pagesExtracted": 2,
  "truncated": true,
  "fileName": "document.pdf"
}

Limits

Tier	Max Pages
Anonymous	2 pages
Authenticated	Full document

PDF to JSON (Public)


POST /api/pdf-to-json

Same parsing as PDF to text; response is structured JSON (pages, blocks, tables). Built for developers and automation.

Request

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	PDF file only

Response


{
  "metadata": {
    "numPages": 10,
    "title": "Document Title",
    "author": "Author",
    "creationDate": "D:20240101120000",
    "fileName": "document.pdf",
    "pagesExtracted": 2,
    "truncated": true
  },
  "pages": [
    {
      "pageNumber": 1,
      "blocks": [
        {
          "type": "Title",
          "content": "Chapter One",
          "bbox": { "page": 1, "left": 0.1, "top": 0.05, "width": 0.8, "height": 0.04 },
          "confidence": "high",
          "metadata": {}
        },
        {
          "type": "Text",
          "content": "Paragraph text...",
          "bbox": { "page": 1, "left": 0.1, "top": 0.1, "width": 0.8, "height": 0.1 },
          "confidence": "high",
          "metadata": {}
        }
      ]
    }
  ],
  "tables": [
    {
      "type": "Table",
      "content": "| A | B |\n|---|---|\n| 1 | 2 |",
      "bbox": { "page": 1, "left": 0.1, "top": 0.2, "width": 0.5, "height": 0.15 },
      "confidence": "high",
      "metadata": {
        "rows": 2,
        "columns": 2,
        "formats": { "markdown": "...", "html": "...", "json": [] }
      }
    }
  ]
}

Limits

Tier	Max Pages
Anonymous	2 pages
Authenticated	Full document

Credit Usage

Parsing costs credits based on page count:

Operation	Credits per Page
Standard parse	1.5
Agentic OCR	3.0

Error Handling

Status	Error	Description
400	`VALIDATION_ERROR`	Invalid input or configuration
404	`NOT_FOUND`	Document not found (docld:// reference)
413	`FILE_TOO_LARGE`	File exceeds 100MB limit
422	`UNSUPPORTED_FORMAT`	File format not supported
500	`PROCESSING_ERROR`	Parse failed

See also: Upload API, Documents, Parsing, Embed API, Webhooks.

Parse API

Parse Document (Synchronous)


POST /api/parse

Parse a document and return structured content immediately. Best for smaller documents or when you need results right away.

Input Options

The parse endpoint accepts multiple input formats:

Input Type	Format	Description
File upload	`multipart/form-data`	Upload file directly
URL	`{"input": "https://..."}`	Fetch from URL
DocLD reference	`{"input": "docld://..."}`	Parse previously uploaded document

Request (File Upload)

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	Document file (max 100MB)
`config`	JSON string	Parsing configuration

Request (JSON)

Content-Type: application/json


{
  "input": "https://example.com/document.pdf",
  "config": {
    "formatting": {
      "table_output_format": "markdown"
    }
  }
}

Or with a DocLD reference:


{
  "input": "docld://abc123-def456"
}

Configuration Options


{
  "formatting": {
    "table_output_format": "markdown"
  },
  "chunking": {
    "strategy": "semantic",
    "max_chunk_size": 1000,
    "overlap": 100
  }
}

Option	Type	Default	Description
`formatting.table_output_format`	string	`markdown`	Table format: `markdown`, `html`, `json`
`chunking.strategy`	string	`semantic`	Chunking: `semantic`, `fixed`, `page`
`chunking.max_chunk_size`	number	1000	Maximum chunk size in characters
`chunking.overlap`	number	100	Overlap between chunks

Response


{
  "job_id": "uuid",
  "duration": 2.5,
  "usage": {
    "num_pages": 5,
    "credits": 7.5
  },
  "result": {
    "type": "full",
    "chunks": [
      {
        "content": "# Invoice\\n\\nInvoice Number: INV-001...",
        "page": 1,
        "metadata": {
          "type": "text",
          "confidence": 0.98
        }
      },
      {
        "content": "| Item | Qty | Price |\\n|------|-----|-------|\\n| Widget | 10 | $5.00 |",
        "page": 2,
        "metadata": {
          "type": "table",
          "confidence": 0.95
        }
      }
    ]
  },
  "studio_link": "https://your-domain.com/documents/abc123"
}

Example

Parse file:


curl -X POST "https://your-domain.com/api/parse" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@document.pdf"

Parse URL:


curl -X POST "https://your-domain.com/api/parse" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "https://example.com/document.pdf"}'

Parse Document (Asynchronous)


POST /api/parse/async

Queue a document for background parsing. Returns immediately with a job ID. Best for large documents or batch processing.

Request (JSON)


{
  "input": "https://example.com/large-document.pdf",
  "config": {},
  "webhook_url": "https://your-server.com/webhook"
}

Field	Type	Description
`input`	string	URL or `docld://` reference
`config`	object	Parsing configuration
`webhook_url`	string	URL to receive completion callback

Response


{
  "job_id": "uuid",
  "status": "pending",
  "message": "Document queued for parsing",
  "status_url": "https://your-domain.com/api/jobs/uuid",
  "webhook_url": "https://your-server.com/webhook"
}

Checking Job Status


curl -X GET "https://your-domain.com/api/jobs/{job_id}" \
  -H "Authorization: Bearer YOUR_API_KEY"

Webhook Callback

When processing completes, a POST request is sent to your webhook URL. See Webhooks for payload details and other webhook-emitting endpoints.


{
  "job_id": "uuid",
  "status": "completed",
  "result": {
    "type": "full",
    "chunks": [...]
  },
  "usage": {
    "num_pages": 10,
    "credits": 15
  }
}

Parse Presets

Save and reuse parsing configurations.

List Presets


GET /api/parse/presets

Query Parameters:

Parameter	Default	Description
`scope`	`all`	`user`, `organization`, or `all`
`organization_id`	-	Filter by organization (when scope includes organization)

Response: { "presets": [ { "id", "name", "description", "config", "scope", "organization_id", "created_at", "updated_at" } ] }.

Create Preset


POST /api/parse/presets

Request body:

Field	Type	Required	Description
`name`	string	Yes	Preset name.
`description`	string	No	Optional description.
`config`	object	No	Parsing config (merged with defaults, validated).
`scope`	string	No	`user` or `organization`. Default `user`.
`organization_id`	string	No	Required when `scope` is `organization`.

Response: { "preset": { "id", "name", "description", "config", "scope", "organization_id", "created_at", "updated_at" } }.

Get Preset


GET /api/parse/presets/{id}

Response: { "preset": { ... } }.

Update Preset


PATCH /api/parse/presets/{id}

Request body: name, description, and/or config (partial update). Config is merged with defaults and validated.

Response: { "preset": { ... } }.

Delete Preset


DELETE /api/parse/presets/{id}

Response: { "success": true }.

PDF to Text (Public)


POST /api/pdf-to-text

Simple PDF to text conversion. Rate-limited, supports anonymous access with restrictions.

Request

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	PDF file only

Response


{
  "text": "Extracted text content...",
  "pageCount": 10,
  "pagesExtracted": 2,
  "truncated": true,
  "fileName": "document.pdf"
}

Limits

Tier	Max Pages
Anonymous	2 pages
Authenticated	Full document

PDF to Markdown (Public)


POST /api/pdf-to-markdown

Convert PDF to structured Markdown (headings, lists, key-values). Same pipeline as PDF to text; output uses Markdown syntax. Ideal for docs, CMSs, and API pipelines.

Request

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	PDF file only

Response


{
  "markdown": "# Title\n\n## Section\n\n- List item...",
  "pageCount": 10,
  "pagesExtracted": 2,
  "truncated": true,
  "fileName": "document.pdf"
}

Limits

Tier	Max Pages
Anonymous	2 pages
Authenticated	Full document

PDF to JSON (Public)


POST /api/pdf-to-json

Same parsing as PDF to text; response is structured JSON (pages, blocks, tables). Built for developers and automation.

Request

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	PDF file only

Response


{
  "metadata": {
    "numPages": 10,
    "title": "Document Title",
    "author": "Author",
    "creationDate": "D:20240101120000",
    "fileName": "document.pdf",
    "pagesExtracted": 2,
    "truncated": true
  },
  "pages": [
    {
      "pageNumber": 1,
      "blocks": [
        {
          "type": "Title",
          "content": "Chapter One",
          "bbox": { "page": 1, "left": 0.1, "top": 0.05, "width": 0.8, "height": 0.04 },
          "confidence": "high",
          "metadata": {}
        },
        {
          "type": "Text",
          "content": "Paragraph text...",
          "bbox": { "page": 1, "left": 0.1, "top": 0.1, "width": 0.8, "height": 0.1 },
          "confidence": "high",
          "metadata": {}
        }
      ]
    }
  ],
  "tables": [
    {
      "type": "Table",
      "content": "| A | B |\n|---|---|\n| 1 | 2 |",
      "bbox": { "page": 1, "left": 0.1, "top": 0.2, "width": 0.5, "height": 0.15 },
      "confidence": "high",
      "metadata": {
        "rows": 2,
        "columns": 2,
        "formats": { "markdown": "...", "html": "...", "json": [] }
      }
    }
  ]
}

Limits

Tier	Max Pages
Anonymous	2 pages
Authenticated	Full document

Credit Usage

Parsing costs credits based on page count:

Operation	Credits per Page
Standard parse	1.5
Agentic OCR	3.0

Error Handling

Status	Error	Description
400	`VALIDATION_ERROR`	Invalid input or configuration
404	`NOT_FOUND`	Document not found (docld:// reference)
413	`FILE_TOO_LARGE`	File exceeds 100MB limit
422	`UNSUPPORTED_FORMAT`	File format not supported
500	`PROCESSING_ERROR`	Parse failed

See also: Upload API, Documents, Parsing, Embed API, Webhooks.