Split API
Split documents into sections based on page ranges or AI-detected boundaries. See the Split feature for concepts and dashboard usage. For async completion callbacks, use webhook_url; see Webhooks for payload details.
Run Split
POST /api/split/runSplit a document into sections.
Request
{
"document_id": "doc-uuid",
"input": "docld://doc-uuid",
"config_id": "config-uuid",
"config": {
"method": "ai",
"instructions": "Split by chapter headings",
"sections": [],
"settings": { "minSectionPages": 2 }
},
"sections": [
{ "name": "Introduction", "page_start": 1, "page_end": 5 },
{ "name": "Chapter 1", "page_start": 6, "page_end": 20 }
],
"skip_analysis": false,
"webhook_url": "https://your-app.com/webhooks/split"
}| Field | Type | Description |
|---|---|---|
document_id | string | Document to split (or use input) |
input | string | URL, docld:// reference, or jobid:// (reuse parsed content from a parse job) |
config_id | string | Optional. Saved split config ID; when provided, that config is used (merged with inline config). |
config | object | Inline split configuration: instructions, sections, settings (e.g. minSectionPages, splitMode). |
sections | array | Manual section definitions (name, page_start, page_end). |
skip_analysis | boolean | Skip AI analysis step. |
webhook_url | string | Optional. URL to POST when job completes or fails (async only). |
Split Methods
| Method | Description |
|---|---|
manual | Use provided sections array |
ai | AI detects section boundaries |
page | Split by page count |
Response
{
"success": true,
"job_id": "split-job-uuid",
"sections": [
{
"id": "section-uuid",
"name": "Introduction",
"section_type": "detected",
"page_start": 1,
"page_end": 5,
"confidence": 0.95,
"document_id": "new-doc-uuid"
},
{
"id": "section-uuid",
"name": "Chapter 1: Getting Started",
"section_type": "detected",
"page_start": 6,
"page_end": 20,
"confidence": 0.92,
"document_id": "new-doc-uuid"
}
]
}Analyze Document
POST /api/split/analyzeAnalyze a document to detect potential split points without actually splitting.
Request
{
"document_id": "doc-uuid",
"config": {
"method": "ai",
"instructions": "Look for chapter headings and major sections"
}
}Response
{
"suggested_sections": [
{
"name": "Table of Contents",
"page_start": 1,
"page_end": 2,
"confidence": 0.98,
"reason": "Detected TOC structure"
},
{
"name": "Introduction",
"page_start": 3,
"page_end": 10,
"confidence": 0.94,
"reason": "Section header: Introduction"
}
],
"total_pages": 50,
"recommended_splits": 5
}Split Configurations
List Configurations
GET /api/split/configsGet saved split configurations.
Create Configuration
POST /api/split/configsSave a reusable split configuration.
{
"name": "Legal Document Splitter",
"description": "Split legal documents by sections",
"instructions": "Split by Article and Section headings",
"sections": [],
"settings": {
"method": "ai",
"min_section_pages": 2,
"preserve_headers": true
}
}Get Configuration
GET /api/split/configs/{id}Update Configuration
PATCH /api/split/configs/{id}Delete Configuration
DELETE /api/split/configs/{id}Batch Split
POST /api/split/batchStart a batch split for multiple documents. Jobs are processed asynchronously via the queue. Use this for “split this folder” or upstream ingestion.
Request
{
"document_ids": ["doc-uuid-1", "doc-uuid-2"],
"input": ["https://example.com/doc.pdf", "docld://doc-uuid-3"],
"config_id": "config-uuid",
"webhook_url": "https://your-app.com/webhooks/split"
}| Field | Type | Description |
|---|---|---|
document_ids | string[] | Document IDs to split (must be accessible to the user). |
input | array | Optional. URLs or docld:// references; each is resolved to a document (created if URL). |
config_id | string | Optional. Saved split config ID. |
webhook_url | string | Optional. Called per job on completion or failure. |
At least one of document_ids or input is required. Combined size is capped (e.g. 50 documents per batch).
Response
{
"batch_id": "batch-uuid",
"total_count": 3,
"message": "Batch split started for 3 document(s)"
}Get batch status
GET /api/split/batch/{id}Returns the batch record and the list of split jobs (id, document_id, status, error, created_at).
Split webhooks
When you provide webhook_url on POST /api/split/run (async) or POST /api/split/batch, DocLD POSTs to that URL when each job completes or fails.
Payload:
| Field | Type | Description |
|---|---|---|
event | string | split.completed or split.failed |
job_id | string | Split job ID |
document_id | string | Document that was split |
status | string | completed or failed |
sections_summary | array | { section_id, name, page_start, page_end, type }[] |
results_url | string | Dashboard URL to view results (e.g. /split?job=...) |
error | string | Present when status is failed |
timestamp | string | ISO 8601 |
Example:
{
"event": "split.completed",
"job_id": "split-job-uuid",
"document_id": "doc-uuid",
"status": "completed",
"sections_summary": [
{ "section_id": "sec-1", "name": "Introduction", "page_start": 1, "page_end": 5, "type": "section" }
],
"results_url": "https://app.docld.com/split?job=split-job-uuid",
"timestamp": "2024-01-15T10:30:00Z"
}Get Split Results
GET /api/split/results/{id}Get the results of a split job.
Response
{
"job_id": "split-job-uuid",
"status": "completed",
"source_document_id": "original-doc-uuid",
"sections": [
{
"id": "section-uuid",
"name": "Chapter 1",
"document_id": "section-doc-uuid",
"page_start": 1,
"page_end": 10
}
],
"created_at": "2024-01-15T10:00:00Z",
"completed_at": "2024-01-15T10:00:30Z"
}Section Types
| Type | Description |
|---|---|
detected | AI-detected section |
custom | User-defined section |
page | Page-based split |
document | Entire document |
Update Section
PATCH /api/split/sections/{id}Update a document section’s label or name. Verifies access via the parent document.
Request body:
| Field | Type | Description |
|---|---|---|
label | string | Optional. Display label (max 64 chars). |
section_name | string | Optional. Section name (non-empty). |
{
"label": "Executive Summary",
"section_name": "Introduction"
}Response: { "section": { "id", "document_id", "section_name", "section_type", "page_start", "page_end", "confidence", "label", "metadata" } }.
Get Document Sections
GET /api/documents/{id}/sectionsGet all sections for a document that has been split. The {id} is the parent document ID.
Response
{
"documentId": "original-doc-uuid",
"documentName": "Report.pdf",
"sections": [
{
"id": "section-uuid",
"sectionName": "Chapter 1",
"sectionType": "detected",
"pageStart": 1,
"pageEnd": 10,
"pageCount": 10,
"confidence": 0.95,
"label": "Chapter 1",
"metadata": {},
"childDocumentId": "section-doc-uuid",
"childDocumentName": "Chapter 1.pdf",
"childDocumentStatus": "completed"
}
],
"totalSections": 5,
"hasSections": true
}See also: Split feature, Webhooks.