Loading…

Features Pricing

Log in Get started

Latency

Latency

Concepts

Latency is the time from sending a request to receiving a response. For document intelligence, latency affects upload-to-ready time, RAG answer generation, extraction results, and vector search retrieval. Lower latency improves user experience and throughput when operations are sequential.

Sources of Latency

Parsing — Parsing and OCR for large documents
Embedding — Embedding generation for chunks
Vector search — vector database vector search lookup
Inference — LLM inference for RAG and extraction
Reranking — Optional reranking adds an LLM call

Async processing and batch processing reduce perceived latency for long-running jobs. Webhooks notify when work completes instead of blocking on polling.

Related Concepts

Latency trades off with throughput. Top-k and reranking affect RAG latency. Rate limits can increase effective latency under load.

Related terms

Throughput
RAG
Inference

Frequently Asked Questions

Latency is the time from request to response. For document intelligence, it affects upload-to-ready time, RAG answer generation, extraction results, and vector search retrieval.

Parsing, embedding, vector search lookup, LLM inference, and optional reranking. Async processing and webhooks reduce perceived latency for long-running jobs.

Use smaller top-k, skip or use fast reranking, and keep chunks reasonably sized. Caching and CDN do not apply to LLM calls; optimize the pipeline instead.

Depends on query length, top-k, and model. First token (if streaming) or full response often in a few seconds; see your dashboard for metrics.

Upload returns quickly with a job ID; processing is async. Your app can show "processing" and use webhooks so latency does not block the user.

Extraction time depends on document length and schema complexity. Multi-page documents take longer; use webhooks for completion.

Batch spreads load; each document still has similar per-document latency. Total wall-clock time depends on concurrency and queue depth.

Yes, in your client. For long-running operations, use a longer timeout or rely on async and webhooks so the request does not need to wait for completion.

The 99th percentile of response time. A few slow requests (e.g., large documents) can raise p99; monitor and optimize for your SLA.

See the DocLD documentation and glossary for details, and check the API reference for related endpoints and options.

On this page

Sources of Latency
Related Concepts
Frequently Asked Questions

Product

Features
Pricing
API Reference

Industries

Healthcare
Retail
Food & Beverage
E-commerce
Construction
View all

Company

About
Careers

Resources

Documentation
Blog
Help Center
Status

Legal

Privacy Policy
Terms of Service
Trust

Connect

X
GitHub
LinkedIn

© 2026 DocLD, Inc.SOC audit in progress