How to Implement Reciprocal Rank Fusion
Why RRF Works for Hybrid Search
When you combine vector search results with keyword search results, you face a fundamental problem: the scores are on different scales and have different distributions. Cosine similarity scores cluster between 0.5 and 0.95 for most queries, while BM25 scores can range from 0.1 to 25 depending on document length and term frequency. Normalizing these scores to a common scale is possible but fragile, because the distributions change with different queries and different corpora.
RRF sidesteps this problem entirely by using rank positions instead of raw scores. It does not matter that the top vector result has a similarity of 0.92 and the top BM25 result has a score of 18.4. All that matters is that they are both ranked first in their respective lists. This rank-based approach makes RRF robust across different scoring systems, different query types, and different data distributions without any tuning of score normalization.
The k constant (typically 60) controls how much weight the top-ranked items receive relative to lower-ranked items. With k = 60, the first-ranked item receives a score of 1/61 = 0.0164 and the tenth-ranked item receives 1/70 = 0.0143, a relatively gentle decline. With k = 1, the first-ranked item receives 1/2 = 0.5 and the tenth receives 1/11 = 0.091, a much steeper decline. Higher k values produce more balanced fusions where rank position matters less, while lower k values give stronger preference to top-ranked items.
Step-by-Step Implementation
For each result list and each document in that list, the RRF score contribution is:
score = 1 / (k + rank) where rank is 1-indexed (the top result has rank 1). The final score for each document is the sum of its contributions across all lists. Documents that appear in multiple lists accumulate higher scores, which is the key insight: RRF favors documents that are ranked well by multiple systems.
# The RRF formula for a document appearing in N lists:
# score(doc) = sum over all lists L of: 1 / (k + rank_L(doc))
#
# Example with k=60:
# Doc A: rank 1 in vector, rank 3 in keyword
# score = 1/(60+1) + 1/(60+3) = 0.01639 + 0.01587 = 0.03226
#
# Doc B: rank 2 in vector only
# score = 1/(60+2) = 0.01613
#
# Doc C: rank 1 in keyword only
# score = 1/(60+1) = 0.01639
#
# Final ranking: A (0.032), C (0.016), B (0.016)
# Doc A wins because it appears in both listsRun each search system and collect ordered results. Each result needs only a document identifier and its rank position. You can include the raw score for debugging, but RRF does not use it. Request more results than your target output (for example, top 50 from each system if you want a final top 10) to ensure that documents appearing in both lists are captured even if ranked lower in one system.
from dataclasses import dataclass
from typing import List, Optional
@dataclass
class SearchResult:
doc_id: str
rank: int
score: Optional[float] = None
def run_searches(query: str, depth: int = 50):
vector_results = vector_search(query, top_k=depth)
keyword_results = keyword_search(query, top_k=depth)
vector_ranked = [
SearchResult(doc_id=r["id"], rank=i+1, score=r["similarity"])
for i, r in enumerate(vector_results)
]
keyword_ranked = [
SearchResult(doc_id=r["id"], rank=i+1, score=r["bm25_score"])
for i, r in enumerate(keyword_results)
]
return [vector_ranked, keyword_ranked]Iterate through all result lists, accumulate the reciprocal rank score for each document, and sort by total score. The implementation is straightforward: a dictionary that maps document IDs to accumulated scores.
from typing import List, Tuple, Dict
def reciprocal_rank_fusion(
result_lists: List[List[SearchResult]],
k: int = 60,
top_n: int = 10
) -> List[Tuple[str, float]]:
"""Fuse multiple ranked lists using RRF.
Args:
result_lists: List of ranked SearchResult lists.
k: Smoothing constant (default 60 per original paper).
top_n: Number of results to return.
Returns:
List of (doc_id, rrf_score) tuples sorted by score descending.
"""
scores: Dict[str, float] = {}
for result_list in result_lists:
for result in result_list:
if result.doc_id not in scores:
scores[result.doc_id] = 0.0
scores[result.doc_id] += 1.0 / (k + result.rank)
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
return ranked[:top_n]
# Usage
all_results = run_searches("database connection pooling")
fused = reciprocal_rank_fusion(all_results, k=60, top_n=10)If one search system is more reliable than another for your query distribution, weight its contributions. A weight of 1.0 keeps the default contribution, while 1.5 gives 50% more influence. Typical configurations weight vector search higher (0.6 to 0.7) for semantic query-heavy applications and keyword search higher (0.5 to 0.6) for exact-match-heavy applications.
def weighted_rrf(
result_lists: List[List[SearchResult]],
weights: List[float] = None,
k: int = 60,
top_n: int = 10
) -> List[Tuple[str, float]]:
if weights is None:
weights = [1.0] * len(result_lists)
assert len(weights) == len(result_lists)
scores: Dict[str, float] = {}
for weight, result_list in zip(weights, result_lists):
for result in result_list:
if result.doc_id not in scores:
scores[result.doc_id] = 0.0
scores[result.doc_id] += weight / (k + result.rank)
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
return ranked[:top_n]
# Weight vector search 60%, keyword search 40%
fused = weighted_rrf(all_results, weights=[0.6, 0.4], k=60, top_n=10)The k value of 60 from the original paper works well as a default, but optimal values depend on your data. Test k values from 10 to 100 on a set of queries with known relevant documents, measuring recall@10 or NDCG@10. Lower k values (10 to 30) give more weight to top-ranked items, which helps when one system's top results are highly reliable. Higher k values (60 to 100) distribute weight more evenly across ranks, which helps when relevant documents are spread across rank positions.
def evaluate_rrf_k_values(
queries_with_relevance,
result_lists_per_query,
k_values=[10, 20, 30, 40, 60, 80, 100]
):
results = {}
for k in k_values:
recalls = []
for query_idx, (query, relevant_ids) in enumerate(queries_with_relevance):
fused = reciprocal_rank_fusion(
result_lists_per_query[query_idx], k=k, top_n=10
)
retrieved_ids = {doc_id for doc_id, _ in fused}
recall = len(retrieved_ids & relevant_ids) / len(relevant_ids)
recalls.append(recall)
results[k] = sum(recalls) / len(recalls)
print(f"k={k}: mean recall@10 = {results[k]:.3f}")
return resultsRRF Beyond Two Lists
RRF generalizes naturally to any number of result lists. If you add a reranker, a knowledge graph traversal, or a recency-based ranking alongside vector and keyword search, just add each as another result list in the fusion. Each list contributes its reciprocal rank scores independently, and documents ranked well across multiple systems rise to the top. Adaptive Recall uses this principle internally, combining vector similarity rankings with cognitive activation rankings, graph traversal rankings, and confidence rankings into a single fused result.
Adaptive Recall fuses four ranking signals, vector similarity, cognitive activation, graph traversal, and confidence, into a single retrieval result. No manual fusion code needed.
Try It Free