Home » Vector Search and Embeddings » Hybrid Search Recall

Why Hybrid Search Gets 91% vs 78% Vector Only

Hybrid search improves retrieval recall by 10 to 15 percentage points because vector search and keyword search fail on different types of queries. Vector search struggles with exact identifiers, rare terms, and domain-specific jargon. Keyword search struggles with paraphrases, conceptual queries, and vocabulary mismatch. By running both and combining results, hybrid search covers the failure modes of each approach, consistently achieving 88 to 91% recall at top-10 across standard benchmarks compared to 75 to 80% for vector-only systems.

The Benchmark Evidence

The BEIR benchmark (Benchmarking IR) tests retrieval systems across 13 diverse datasets covering scientific papers, web pages, question answering, fact checking, and more. Results across multiple studies show a consistent pattern: hybrid search with reciprocal rank fusion outperforms both vector-only and keyword-only systems on nearly every dataset.

On the NQ (Natural Questions) dataset, vector search achieves roughly 80% NDCG@10 while BM25 achieves roughly 32%. Hybrid search achieves 83%. On the SCIFACT dataset (scientific fact checking), vector search achieves roughly 67%, BM25 achieves roughly 66%, and hybrid achieves 71%. On the FiQA dataset (financial QA), vector search achieves 34%, BM25 achieves 23%, and hybrid achieves 37%. The improvement varies by dataset because some datasets are more keyword-heavy (exact matches matter more) while others are more semantic (meaning matching matters more), but hybrid consistently wins or ties.

The 91% vs 78% figures come from production evaluations on technical documentation corpora, where exact terms (function names, error codes, configuration parameters) are common alongside semantic queries. In these environments, the keyword search component captures 30 to 40% of queries that vector search would miss entirely, while vector search captures 50 to 60% of queries that keyword search would miss. The overlap means both systems find the same documents for many queries, but the non-overlapping coverage is what drives the recall improvement.

Why the Failure Modes Are Complementary

Vector search fails when the embedding model cannot produce a discriminative vector for the query. This happens with exact identifiers ("ERR_CONN_REFUSED"), rare technical terms ("pgbouncer transaction pooling mode"), and strings the model has seen infrequently in training data. The embedding for these queries is vague, pointing in the general direction of the topic without enough specificity to rank the correct document above similar-but-wrong documents.

Keyword search fails when the query and the relevant document use different vocabulary. "How to speed up my API" finds nothing if the documentation uses "performance optimization" and "latency reduction" instead of "speed up." The inverted index maps tokens to documents, and if the tokens do not match, the document is invisible to keyword search regardless of how relevant it is.

These failure modes are almost perfectly complementary. The queries that confuse vector search (exact terms, identifiers) are trivial for keyword search because it performs exact string matching. The queries that confuse keyword search (paraphrases, conceptual descriptions) are exactly what vector search excels at because embedding models learn semantic equivalence. Running both means the system has a strong signal for virtually every query type.

When the Gap Is Largest

The recall gap between hybrid and vector-only is largest on corpora with high identifier density. Technical documentation, API references, configuration guides, and error catalogs contain many exact terms that users search for directly. In these environments, 30 to 40% of queries are exact-match oriented, and vector-only systems miss most of them.

The gap is smallest on corpora with purely semantic content. Blog posts, news articles, and educational content rarely contain exact identifiers that users search for. In these environments, vector search handles 90%+ of queries well, and keyword search adds marginal improvement.

For AI memory systems, the gap tends to be large because stored memories often contain specific names, dates, project codes, and technical identifiers that users later search for. A memory containing "deployed to staging using v2.4.1 on Thursday" needs keyword matching to find when a user searches for "v2.4.1" because the embedding for "v2.4.1" is weak.

Implementation Cost vs Benefit

Adding keyword search to an existing vector search system is a moderate engineering investment. If your vector database supports hybrid search natively (Weaviate, Qdrant), it is a configuration change. If you use pgvector in PostgreSQL, full-text search is already available in the same database. If you use a database without native hybrid support (Pinecone), you need to add a separate search index (Elasticsearch, OpenSearch) and implement fusion logic.

The return on that investment is substantial. A 10 to 15 percentage point recall improvement means 10 to 15% more queries get useful results. In a customer-facing RAG application, this translates directly to fewer "I could not find the answer" responses and higher user satisfaction. In an AI agent workflow, it means the agent receives better context and makes fewer errors.

Adaptive Recall combines multiple retrieval signals by design. Vector similarity provides semantic matching. The knowledge graph provides entity-level keyword matching through entity lookup. Cognitive scoring provides recency and frequency weighting. Together, these signals achieve the recall benefits of hybrid search and more, without requiring you to build and maintain separate search indexes.

Go beyond hybrid search. Adaptive Recall combines four retrieval signals for accuracy that vector plus keyword alone cannot match.

Try It Free