Cosine Similarity
AICosine similarity is a measure of similarity between two vectors based on the angle between them. It ranges from -1 (opposite) to 1 (identical); for normalized embeddings, it typically falls in 0–1. Vector search uses cosine similarity (or related metrics) to rank chunks by relevance to a query.
Why Cosine Similarity
- Scale-invariant — Focuses on direction, not magnitude; long vs. short text can still be compared fairly
- Common for embeddings — Embedding models often produce normalized vectors
- Efficient — Vector indexes like Pinecone optimize cosine similarity lookups
DocLD uses Pinecone for vector search; retrieved chunks are ranked by similarity score (e.g., cosine similarity). Top-k controls how many are returned for RAG.
Related Concepts
Cosine similarity is a similarity score used in vector search. Embedding produces vectors for comparison. Dimensionality affects vector representation.