Semantic Search vs Keyword Search Explained
How Keyword Search Works
Keyword search builds an inverted index: a data structure that maps every word in the corpus to the list of documents containing that word. When a user searches for "database connection timeout," the system looks up each word in the index, finds the documents that contain all three words (or some combination), and ranks them by relevance. The most common ranking algorithm is BM25, which scores documents based on term frequency (how often the query words appear in the document), inverse document frequency (how rare the query words are across the corpus), and document length normalization.
Keyword search has been the dominant retrieval technology for decades. Google's original search engine, Elasticsearch, Solr, and PostgreSQL's full-text search all use inverted indexes with TF-IDF or BM25 scoring. The technology is mature, fast (sub-millisecond queries even on large corpora), and well-understood.
The fundamental limitation is vocabulary matching. If the query uses word X and the relevant document uses synonym Y, keyword search cannot make the connection. A query for "fix broken login" finds nothing in a document that says "resolve authentication failure" because none of the query words appear in the document. This vocabulary mismatch problem means keyword search recall degrades significantly on corpora where authors use inconsistent terminology, which is most real-world corpora.
How Semantic Search Works
Semantic search converts both the query and all documents into numerical vectors using an embedding model, then finds the documents whose vectors are closest to the query vector. The embedding model is a neural network trained on millions of text pairs to produce similar vectors for semantically similar text. "Fix broken login" and "resolve authentication failure" produce nearby vectors because the model learned that these phrases express the same meaning.
Semantic search emerged as a practical retrieval technology around 2020 with the availability of high-quality embedding models (Sentence-BERT, OpenAI embeddings) and efficient vector databases (Pinecone, Milvus, FAISS). Before this, semantic similarity could be computed but was too slow and too inaccurate for production retrieval at scale.
The fundamental limitation is precision on exact terms. When a user searches for a specific error code, version number, or identifier, the embedding model produces a vague vector that represents the general category (errors, versions) rather than the specific string. Keyword search handles these trivially because exact string matching is exactly what inverted indexes do.
Where Each Excels
Semantic search excels at: Conceptual queries ("how to make the API faster"), vocabulary mismatch ("fix broken login" matching "resolve authentication failure"), partial descriptions ("that thing where the server stops accepting connections"), multilingual matching (query in English, document in Spanish with multilingual embeddings), and intent understanding ("I need to deploy to production" matching documentation about deployment procedures).
Keyword search excels at: Exact identifiers ("ERR_CONN_REFUSED," "JIRA-4521"), version numbers and configuration values ("React 18.2.0," "max_connections=100"), proper nouns that the embedding model has not seen ("PgBouncer," your internal tool names), Boolean queries (documents containing X but not Y), and acronyms and abbreviations that are ambiguous in embedding space.
The Numbers
On the BEIR benchmark across 13 datasets, keyword search (BM25) averages roughly 65 to 72% NDCG@10. Semantic search (modern embedding model) averages roughly 75 to 82% NDCG@10. Hybrid search (both combined with RRF) averages roughly 80 to 88% NDCG@10. The gap between semantic-only and hybrid is 5 to 10 percentage points, which represents the queries where keyword search captures relevant documents that semantic search misses.
On domain-specific corpora with high identifier density (technical documentation, code repositories, API references), the gap widens to 10 to 15 percentage points because these corpora have more exact-match queries. On general content corpora (news, encyclopedia, blog posts), the gap narrows to 2 to 5 percentage points because these corpora have fewer exact-match queries.
Why Not Just Use Semantic Search
If semantic search handles 80% of queries and keyword search only adds 10 more percentage points, why not just use semantic search and accept the 10% gap? Because the 10% of queries that semantic search misses are often the most important ones. A user searching for a specific error code is experiencing a problem right now and needs the exact documentation for that error. A user searching for a specific configuration parameter needs the exact value. These are high-intent, high-urgency queries where failure to retrieve the correct document directly causes frustration or wasted time.
Additionally, the cost of adding keyword search is low compared to the cost of building vector search in the first place. If you use PostgreSQL with pgvector, full-text search is already built in. If you use Weaviate or Qdrant, hybrid search is a configuration parameter. The engineering effort to add keyword search alongside existing vector search is typically a day or two, while the recall improvement is permanent and significant.
Beyond Both: Multi-Signal Retrieval
Semantic search and keyword search are both content-matching approaches. They determine relevance based on the text content of documents. But relevance in practice depends on more than content similarity. A memory that was accessed recently is more likely to be relevant than one accessed months ago. A memory that has been corroborated by multiple sources is more trustworthy than one mentioned once. A memory connected to the current topic through entity relationships may be relevant even if the text content does not match the query.
Adaptive Recall combines content matching (vector similarity) with cognitive scoring (recency and frequency based activation), knowledge graph traversal (entity-level connections), and confidence weighting (corroboration history) into a single retrieval system. This multi-signal approach captures the benefits of both semantic and keyword search while adding the temporal, relational, and trust dimensions that content-only approaches miss.
Move beyond the semantic vs keyword debate. Adaptive Recall combines four retrieval signals for accuracy that either approach alone cannot match.
Get Started Free