Documents API
Manage uploaded documents, retrieve details, and perform bulk operations. Documents are created via Upload and processed with Parse or Extract. For document comparison, see Comparison.
List Documents
GET /api/documentsReturns a paginated list of documents.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
search | string | - | Search in document names and content |
status | string[] | - | Filter by status: pending, processing, completed, failed |
file_type | string[] | - | Filter by type: pdf, image, spreadsheet, presentation, text |
file_format | string[] | - | Filter by format: pdf, png, jpg, xlsx, docx, etc. |
knowledge_base_id | string | - | Filter by knowledge base membership |
tags | string[] | - | Filter by tags in metadata |
date_from | string | - | Filter by creation date (ISO 8601) |
date_to | string | - | Filter by creation date (ISO 8601) |
sort_by | string | created_at | Sort field: name, created_at, file_size, status, updated_at |
sort_order | string | desc | Sort order: asc, desc |
limit | number | 50 | Results per page (max 100) |
offset | number | 0 | Pagination offset |
Response
{
"documents": [
{
"id": "uuid",
"name": "invoice.pdf",
"file_type": "pdf",
"file_format": "pdf",
"file_size": 125000,
"status": "completed",
"metadata": {
"numPages": 3,
"tags": ["invoice", "2024"]
},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:31:00Z"
}
],
"total": 150,
"limit": 50,
"offset": 0,
"has_more": true
}Example
curl -X GET "https://your-domain.com/api/documents?status=completed&limit=10" \
-H "Authorization: Bearer YOUR_API_KEY"Get Document
GET /api/documents/{id}Get details for a specific document, including a signed URL for file access.
Path Parameters
| Parameter | Type | Description |
|---|---|---|
id | string | Document ID (UUID) |
Response
{
"id": "uuid",
"name": "invoice.pdf",
"file_type": "pdf",
"file_format": "pdf",
"file_size": 125000,
"file_url": "https://signed-url...",
"status": "completed",
"metadata": {
"numPages": 3
},
"parsing_config": {
"chunking": { "strategy": "semantic" }
},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:31:00Z",
"created_by": "user-uuid"
}Example
curl -X GET "https://your-domain.com/api/documents/abc123" \
-H "Authorization: Bearer YOUR_API_KEY"Update Document
PATCH /api/documents/{id}Update document status or metadata.
Request Body
{
"status": "completed",
"metadata": {
"tags": ["reviewed", "approved"]
}
}Response
Returns the updated document object.
Delete Document
DELETE /api/documents/{id}Permanently delete a document and its associated data (chunks, vectors, extractions).
Headers
| Header | Required | Description |
|---|---|---|
X-Access-Reason | HIPAA only | Reason for deletion (required for HIPAA compliance) |
Response
{
"success": true
}Bulk Operations
POST /api/documents/bulkPerform bulk operations on multiple documents.
Request Body
{
"action": "delete",
"document_ids": ["uuid1", "uuid2", "uuid3"],
"options": {}
}Actions
| Action | Description | Required Options |
|---|---|---|
delete | Delete multiple documents | - |
add_to_kb | Add to knowledge base | knowledge_base_id |
reprocess | Reprocess documents | - |
add_tags | Add tags to metadata | tags |
remove_tags | Remove tags from metadata | tags |
Examples
Add to Knowledge Base:
{
"action": "add_to_kb",
"document_ids": ["uuid1", "uuid2"],
"options": {
"knowledge_base_id": "kb-uuid"
}
}Add Tags:
{
"action": "add_tags",
"document_ids": ["uuid1", "uuid2"],
"options": {
"tags": ["reviewed", "q1-2024"]
}
}Response
{
"action": "add_to_kb",
"results": [
{ "document_id": "uuid1", "success": true },
{ "document_id": "uuid2", "success": true },
{ "document_id": "uuid3", "success": false, "error": "Document not found" }
],
"success_count": 2,
"failure_count": 1
}Get Document Status (SSE)
GET /api/documents/{id}/statusStream real-time processing status updates using Server-Sent Events.
Response (SSE Stream)
event: status
data: {"status": "processing", "progress": 50, "stage": "parsing"}
event: status
data: {"status": "processing", "progress": 75, "stage": "chunking"}
event: status
data: {"status": "completed", "progress": 100}Example (JavaScript)
const eventSource = new EventSource('/api/documents/abc123/status', {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
eventSource.onmessage = (event) => {
const status = JSON.parse(event.data);
console.log('Status:', status.status, 'Progress:', status.progress);
};Reprocess Document
POST /api/documents/{id}/reprocessRe-run the processing pipeline on an existing document. Clears existing chunks and vectors.
Response
{
"success": true,
"message": "Document queued for reprocessing",
"document_id": "uuid"
}Get Parsed Content
GET /api/documents/{id}/parseGet the parsed text content and metadata for a document.
Response
{
"text": "Full extracted text content...",
"pages": [
{ "page": 1, "text": "Page 1 content..." },
{ "page": 2, "text": "Page 2 content..." }
],
"metadata": {
"numPages": 2,
"title": "Document Title",
"author": "Author Name"
},
"tables": [
{ "page": 1, "content": "| Header | Value |\\n|--------|-------|" }
],
"figures": [
{ "page": 1, "caption": "Figure 1", "description": "Chart showing..." }
]
}Get Document Preview
GET /api/documents/{id}/previewGet document preview data including file URL, extractions, and chunks summary.
Response
{
"document": {
"id": "uuid",
"name": "invoice.pdf",
"status": "completed"
},
"file_url": "https://signed-url...",
"extractions": [
{
"id": "extraction-uuid",
"schema_name": "Invoice",
"confidence": 0.95,
"created_at": "2024-01-15T10:31:00Z"
}
],
"edits": [],
"chunks_summary": {
"total": 15,
"pages": 3
}
}Get Audit Trail
GET /api/documents/{id}/auditGet the audit trail for a document (GDPR/HIPAA compliance).
Response
{
"entries": [
{
"id": "uuid",
"action": "view",
"user_id": "user-uuid",
"timestamp": "2024-01-15T10:30:00Z",
"ip_address": "192.168.1.1",
"user_agent": "Mozilla/5.0..."
},
{
"id": "uuid",
"action": "extract",
"user_id": "user-uuid",
"timestamp": "2024-01-15T10:31:00Z",
"metadata": {
"schema_id": "schema-uuid"
}
}
]
}Stream Document File
GET /api/documents/{id}/fileStream the document file directly. Same-origin requests only.
Response
Returns the file with appropriate Content-Type header.
See also: Upload, Parse, Extract, Comparison.