What is the role of the vector index in a VDBMS?
A vector index allows fast nearest-neighbor search over embeddings.
Instead of scanning all vectors, the index organizes them so similar vectors can be found quickly — essential because exact kNN search is too slow in high dimensions (curse of dimensionality)
What is the role of the AI model in a VDBMS?
The AI model transforms unstructured data (text, images, reviews, etc.) into fixed-size embeddings that capture semantic meaning. Queries are also converted into embeddings, enabling similarity search over meaning rather than exact text
Why do we need both an AI model AND a vector index?
The AI model creates embeddings, and the vector index allows efficient search over them.
Together, they enable modern semantic and AI-driven queries (like RAG, similarity search, and unstructured queries).
What are the three main types of vector indexes in the PDF?
Locality-Sensitive Hashing (LSH)
Product Quantization (PQ)
Hierarchical Navigable Small World Graphs (HNSW)
What is Locality Sensitive Hashing (LSH)?
LSH hashes vectors so that similar items hash to the same bucket with high probability. It enables fast approximate similarity search by checking only nearby buckets rather than all vectors.
What is Product Quantization (PQ)?
PQ splits a vector into subvectors, quantizes each subvector, and stores compact codes.
This reduces memory and speeds up distance computations by comparing quantized versions instead of full vectors.
What is HNSW?
HNSW (Hierarchical Navigable Small World) builds multi-level graphs with small-world properties.
Search starts at the top layer and navigates down, following edges to get close to the nearest neighbors. It is extremely fast and widely used in vector DBs.
Why use approximate nearest-neighbor (ANN) queries instead of exact ones?
Exact kNN search does not scale in high dimensions — all points appear equally near (curse of dimensionality).
Approximate search is much faster, scalable, and still accurate enough for semantic applications like RAG and AI retrieval.
When do we prefer approximate over exact search?
When datasets are large, embeddings are high-dimensional, or queries need low latency (e.g., real-time AI systems).
What distance metrics are shown in the PDF?
euclidian, dot/cosine and hamming
Why do vector databases need distance metrics?
Distance metrics measure similarity between embeddings, enabling nearest-neighbor search for semantic queries.