Latency
ConceptsLatency is the time from sending a request to receiving a response. For document intelligence, latency affects upload-to-ready time, RAG answer generation, extraction results, and vector search retrieval. Lower latency improves user experience and throughput when operations are sequential.
Sources of Latency
- Parsing — Parsing and OCR for large documents
- Embedding — Embedding generation for chunks
- Vector search — Pinecone vector search lookup
- Inference — LLM inference for RAG and extraction
- Reranking — Optional reranking adds an LLM call
Async processing and batch processing reduce perceived latency for long-running jobs. Webhooks notify when work completes instead of blocking on polling.
Related Concepts
Latency trades off with throughput. Top-k and reranking affect RAG latency. Rate limits can increase effective latency under load.