Documents API

Manage uploaded documents, retrieve details, and perform bulk operations. Documents are created via Upload and processed with Parse or Extract. For document comparison, see Comparison.

List Documents


GET /api/documents

Returns a paginated list of documents.

Query Parameters

Parameter	Type	Default	Description
`search`	string	-	Search in document names and content
`status`	string[]	-	Filter by status: `pending`, `processing`, `completed`, `failed`
`file_type`	string[]	-	Filter by type: `pdf`, `image`, `spreadsheet`, `presentation`, `text`
`file_format`	string[]	-	Filter by format: `pdf`, `png`, `jpg`, `xlsx`, `docx`, etc.
`knowledge_base_id`	string	-	Filter by knowledge base membership
`tags`	string[]	-	Filter by tags in metadata
`date_from`	string	-	Filter by creation date (ISO 8601)
`date_to`	string	-	Filter by creation date (ISO 8601)
`sort_by`	string	`created_at`	Sort field: `name`, `created_at`, `file_size`, `status`, `updated_at`
`sort_order`	string	`desc`	Sort order: `asc`, `desc`
`limit`	number	50	Results per page (max 100)
`offset`	number	0	Pagination offset

Response


{
  "documents": [
    {
      "id": "uuid",
      "name": "invoice.pdf",
      "file_type": "pdf",
      "file_format": "pdf",
      "file_size": 125000,
      "status": "completed",
      "metadata": {
        "numPages": 3,
        "tags": ["invoice", "2024"]
      },
      "created_at": "2024-01-15T10:30:00Z",
      "updated_at": "2024-01-15T10:31:00Z"
    }
  ],
  "total": 150,
  "limit": 50,
  "offset": 0,
  "has_more": true
}

Example


curl -X GET "https://your-domain.com/api/documents?status=completed&limit=10" \
  -H "Authorization: Bearer YOUR_API_KEY"

Get Document


GET /api/documents/{id}

Get details for a specific document, including a signed URL for file access.

Path Parameters

Parameter	Type	Description
`id`	string	Document ID (UUID)

Response


{
  "id": "uuid",
  "name": "invoice.pdf",
  "file_type": "pdf",
  "file_format": "pdf",
  "file_size": 125000,
  "file_url": "https://signed-url...",
  "status": "completed",
  "metadata": {
    "numPages": 3
  },
  "parsing_config": {
    "chunking": { "strategy": "semantic" }
  },
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:31:00Z",
  "created_by": "user-uuid"
}

Example


curl -X GET "https://your-domain.com/api/documents/abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Update Document


PATCH /api/documents/{id}

Update document status or metadata.

Request Body


{
  "status": "completed",
  "metadata": {
    "tags": ["reviewed", "approved"]
  }
}

Response

Returns the updated document object.

Delete Document


DELETE /api/documents/{id}

Permanently delete a document and its associated data (chunks, vectors, extractions).

Headers

Header	Required	Description
`X-Access-Reason`	HIPAA only	Reason for deletion (required for HIPAA compliance)

Response


{
  "success": true
}

Bulk Operations


POST /api/documents/bulk

Perform bulk operations on multiple documents.

Request Body


{
  "action": "delete",
  "document_ids": ["uuid1", "uuid2", "uuid3"],
  "options": {}
}

Actions

Action	Description	Required Options
`delete`	Delete multiple documents	-
`add_to_kb`	Add to knowledge base	`knowledge_base_id`
`reprocess`	Reprocess documents	-
`add_tags`	Add tags to metadata	`tags`
`remove_tags`	Remove tags from metadata	`tags`

Examples

Add to Knowledge Base:


{
  "action": "add_to_kb",
  "document_ids": ["uuid1", "uuid2"],
  "options": {
    "knowledge_base_id": "kb-uuid"
  }
}

Add Tags:


{
  "action": "add_tags",
  "document_ids": ["uuid1", "uuid2"],
  "options": {
    "tags": ["reviewed", "q1-2024"]
  }
}

Response


{
  "action": "add_to_kb",
  "results": [
    { "document_id": "uuid1", "success": true },
    { "document_id": "uuid2", "success": true },
    { "document_id": "uuid3", "success": false, "error": "Document not found" }
  ],
  "success_count": 2,
  "failure_count": 1
}

Get Document Status (SSE)


GET /api/documents/{id}/status

Stream real-time processing status updates using Server-Sent Events.

Response (SSE Stream)


event: status
data: {"status": "processing", "progress": 50, "stage": "parsing"}

event: status
data: {"status": "processing", "progress": 75, "stage": "chunking"}

event: status
data: {"status": "completed", "progress": 100}

Example (JavaScript)


const eventSource = new EventSource('/api/documents/abc123/status', {
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
 
eventSource.onmessage = (event) => {
  const status = JSON.parse(event.data);
  console.log('Status:', status.status, 'Progress:', status.progress);
};

Reprocess Document


POST /api/documents/{id}/reprocess

Re-run the processing pipeline on an existing document. Clears existing chunks and vectors.

Response


{
  "success": true,
  "message": "Document queued for reprocessing",
  "document_id": "uuid"
}

Get Parsed Content


GET /api/documents/{id}/parse

Get the parsed text content and metadata for a document.

Response


{
  "text": "Full extracted text content...",
  "pages": [
    { "page": 1, "text": "Page 1 content..." },
    { "page": 2, "text": "Page 2 content..." }
  ],
  "metadata": {
    "numPages": 2,
    "title": "Document Title",
    "author": "Author Name"
  },
  "tables": [
    { "page": 1, "content": "| Header | Value |\\n|--------|-------|" }
  ],
  "figures": [
    { "page": 1, "caption": "Figure 1", "description": "Chart showing..." }
  ]
}

Get Document Preview


GET /api/documents/{id}/preview

Get document preview data including file URL, extractions, and chunks summary.

Response


{
  "document": {
    "id": "uuid",
    "name": "invoice.pdf",
    "status": "completed"
  },
  "file_url": "https://signed-url...",
  "extractions": [
    {
      "id": "extraction-uuid",
      "schema_name": "Invoice",
      "confidence": 0.95,
      "created_at": "2024-01-15T10:31:00Z"
    }
  ],
  "edits": [],
  "chunks_summary": {
    "total": 15,
    "pages": 3
  }
}

Get Audit Trail


GET /api/documents/{id}/audit

Get the audit trail for a document (GDPR/HIPAA compliance).

Response


{
  "entries": [
    {
      "id": "uuid",
      "action": "view",
      "user_id": "user-uuid",
      "timestamp": "2024-01-15T10:30:00Z",
      "ip_address": "192.168.1.1",
      "user_agent": "Mozilla/5.0..."
    },
    {
      "id": "uuid",
      "action": "extract",
      "user_id": "user-uuid",
      "timestamp": "2024-01-15T10:31:00Z",
      "metadata": {
        "schema_id": "schema-uuid"
      }
    }
  ]
}

Stream Document File


GET /api/documents/{id}/file

Stream the document file directly. Same-origin requests only.

Response

Returns the file with appropriate Content-Type header.

See also: Upload, Parse, Extract, Comparison.

Documents API

Manage uploaded documents, retrieve details, and perform bulk operations. Documents are created via Upload and processed with Parse or Extract. For document comparison, see Comparison.

List Documents


GET /api/documents

Returns a paginated list of documents.

Query Parameters

Parameter	Type	Default	Description
`search`	string	-	Search in document names and content
`status`	string[]	-	Filter by status: `pending`, `processing`, `completed`, `failed`
`file_type`	string[]	-	Filter by type: `pdf`, `image`, `spreadsheet`, `presentation`, `text`
`file_format`	string[]	-	Filter by format: `pdf`, `png`, `jpg`, `xlsx`, `docx`, etc.
`knowledge_base_id`	string	-	Filter by knowledge base membership
`tags`	string[]	-	Filter by tags in metadata
`date_from`	string	-	Filter by creation date (ISO 8601)
`date_to`	string	-	Filter by creation date (ISO 8601)
`sort_by`	string	`created_at`	Sort field: `name`, `created_at`, `file_size`, `status`, `updated_at`
`sort_order`	string	`desc`	Sort order: `asc`, `desc`
`limit`	number	50	Results per page (max 100)
`offset`	number	0	Pagination offset

Response


{
  "documents": [
    {
      "id": "uuid",
      "name": "invoice.pdf",
      "file_type": "pdf",
      "file_format": "pdf",
      "file_size": 125000,
      "status": "completed",
      "metadata": {
        "numPages": 3,
        "tags": ["invoice", "2024"]
      },
      "created_at": "2024-01-15T10:30:00Z",
      "updated_at": "2024-01-15T10:31:00Z"
    }
  ],
  "total": 150,
  "limit": 50,
  "offset": 0,
  "has_more": true
}

Example


curl -X GET "https://your-domain.com/api/documents?status=completed&limit=10" \
  -H "Authorization: Bearer YOUR_API_KEY"

Get Document


GET /api/documents/{id}

Get details for a specific document, including a signed URL for file access.

Path Parameters

Parameter	Type	Description
`id`	string	Document ID (UUID)

Response


{
  "id": "uuid",
  "name": "invoice.pdf",
  "file_type": "pdf",
  "file_format": "pdf",
  "file_size": 125000,
  "file_url": "https://signed-url...",
  "status": "completed",
  "metadata": {
    "numPages": 3
  },
  "parsing_config": {
    "chunking": { "strategy": "semantic" }
  },
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:31:00Z",
  "created_by": "user-uuid"
}

Example


curl -X GET "https://your-domain.com/api/documents/abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Update Document


PATCH /api/documents/{id}

Update document status or metadata.

Request Body


{
  "status": "completed",
  "metadata": {
    "tags": ["reviewed", "approved"]
  }
}

Response

Returns the updated document object.

Delete Document


DELETE /api/documents/{id}

Permanently delete a document and its associated data (chunks, vectors, extractions).

Headers

Header	Required	Description
`X-Access-Reason`	HIPAA only	Reason for deletion (required for HIPAA compliance)

Response


{
  "success": true
}

Bulk Operations


POST /api/documents/bulk

Perform bulk operations on multiple documents.

Request Body


{
  "action": "delete",
  "document_ids": ["uuid1", "uuid2", "uuid3"],
  "options": {}
}

Actions

Action	Description	Required Options
`delete`	Delete multiple documents	-
`add_to_kb`	Add to knowledge base	`knowledge_base_id`
`reprocess`	Reprocess documents	-
`add_tags`	Add tags to metadata	`tags`
`remove_tags`	Remove tags from metadata	`tags`

Examples

Add to Knowledge Base:


{
  "action": "add_to_kb",
  "document_ids": ["uuid1", "uuid2"],
  "options": {
    "knowledge_base_id": "kb-uuid"
  }
}

Add Tags:


{
  "action": "add_tags",
  "document_ids": ["uuid1", "uuid2"],
  "options": {
    "tags": ["reviewed", "q1-2024"]
  }
}

Response


{
  "action": "add_to_kb",
  "results": [
    { "document_id": "uuid1", "success": true },
    { "document_id": "uuid2", "success": true },
    { "document_id": "uuid3", "success": false, "error": "Document not found" }
  ],
  "success_count": 2,
  "failure_count": 1
}

Get Document Status (SSE)


GET /api/documents/{id}/status

Stream real-time processing status updates using Server-Sent Events.

Response (SSE Stream)


event: status
data: {"status": "processing", "progress": 50, "stage": "parsing"}

event: status
data: {"status": "processing", "progress": 75, "stage": "chunking"}

event: status
data: {"status": "completed", "progress": 100}

Example (JavaScript)


const eventSource = new EventSource('/api/documents/abc123/status', {
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
 
eventSource.onmessage = (event) => {
  const status = JSON.parse(event.data);
  console.log('Status:', status.status, 'Progress:', status.progress);
};

Reprocess Document


POST /api/documents/{id}/reprocess

Re-run the processing pipeline on an existing document. Clears existing chunks and vectors.

Response


{
  "success": true,
  "message": "Document queued for reprocessing",
  "document_id": "uuid"
}

Get Parsed Content


GET /api/documents/{id}/parse

Get the parsed text content and metadata for a document.

Response


{
  "text": "Full extracted text content...",
  "pages": [
    { "page": 1, "text": "Page 1 content..." },
    { "page": 2, "text": "Page 2 content..." }
  ],
  "metadata": {
    "numPages": 2,
    "title": "Document Title",
    "author": "Author Name"
  },
  "tables": [
    { "page": 1, "content": "| Header | Value |\\n|--------|-------|" }
  ],
  "figures": [
    { "page": 1, "caption": "Figure 1", "description": "Chart showing..." }
  ]
}

Get Document Preview


GET /api/documents/{id}/preview

Get document preview data including file URL, extractions, and chunks summary.

Response


{
  "document": {
    "id": "uuid",
    "name": "invoice.pdf",
    "status": "completed"
  },
  "file_url": "https://signed-url...",
  "extractions": [
    {
      "id": "extraction-uuid",
      "schema_name": "Invoice",
      "confidence": 0.95,
      "created_at": "2024-01-15T10:31:00Z"
    }
  ],
  "edits": [],
  "chunks_summary": {
    "total": 15,
    "pages": 3
  }
}

Get Audit Trail


GET /api/documents/{id}/audit

Get the audit trail for a document (GDPR/HIPAA compliance).

Response


{
  "entries": [
    {
      "id": "uuid",
      "action": "view",
      "user_id": "user-uuid",
      "timestamp": "2024-01-15T10:30:00Z",
      "ip_address": "192.168.1.1",
      "user_agent": "Mozilla/5.0..."
    },
    {
      "id": "uuid",
      "action": "extract",
      "user_id": "user-uuid",
      "timestamp": "2024-01-15T10:31:00Z",
      "metadata": {
        "schema_id": "schema-uuid"
      }
    }
  ]
}

Stream Document File


GET /api/documents/{id}/file

Stream the document file directly. Same-origin requests only.

Response

Returns the file with appropriate Content-Type header.

See also: Upload, Parse, Extract, Comparison.