How to Add GraphRAG to an Existing RAG Pipeline
Where GraphRAG Fits in Your Pipeline
A standard RAG pipeline has three stages: embed the query, retrieve similar chunks from the vector database, and pass those chunks to the LLM as context. GraphRAG adds a parallel retrieval path between embedding and context assembly. The query goes through both vector search and graph traversal simultaneously, and the results are merged before being passed to the LLM.
The architecture looks like this: the user query enters the system, is processed for entity extraction and embedding in parallel, vector search returns semantically similar chunks, graph traversal returns structurally connected documents, a fusion step combines both result sets with weighted scores, and the merged results go into the LLM prompt. This parallel architecture means GraphRAG adds minimal latency because graph traversal runs concurrently with vector search rather than sequentially.
Step-by-Step Implementation
If you do not already have a knowledge graph, build one from your existing document corpus using the entity extraction and relationship identification process described in our guide to building graphs from text. Run extraction over the same documents that are already in your vector database. Store entities and relationships in a graph database (Neo4j, Amazon Neptune) or a triple table in your existing relational database. The graph and the vector database should share document identifiers so you can link graph entities back to the chunks they came from.
# Link graph entities to vector store documents
entity_to_chunks = {}
for chunk_id, chunk_text in enumerate(chunks):
entities = extract_entities(chunk_text)
for entity in entities:
if entity["name"] not in entity_to_chunks:
entity_to_chunks[entity["name"]] = []
entity_to_chunks[entity["name"]].append(chunk_id)Before vector search, extract entities from the user query. This identifies the graph nodes to start traversal from. Use the same entity extraction approach you used during graph construction (LLM-based or NER model) to ensure consistent entity identification. If no entities are found in the query, skip graph traversal and fall back to vector-only search. This graceful fallback means GraphRAG never degrades below your current performance.
def process_query(query):
# parallel: entity extraction + embedding
query_entities = extract_entities(query)
query_embedding = embed(query)
# parallel: vector search + graph traversal
vector_results = vector_search(query_embedding, top_k=20)
if query_entities:
graph_results = graph_traverse(query_entities, max_depth=2)
else:
graph_results = []
# fuse results
return fuse_results(vector_results, graph_results)For each entity found in the query, look it up in the graph and traverse its relationships to depth 2 (direct connections and their connections). Collect all entities and documents encountered during traversal. Apply activation decay so directly connected results score higher than two-hop results. A decay factor of 0.5 works well as a starting point: direct connections get a score of 1.0, two-hop connections get 0.5.
def graph_traverse(query_entities, max_depth=2, decay=0.5):
activated = {}
for entity in query_entities:
activated[entity["name"]] = 1.0
neighbors = get_neighbors(entity["name"])
for neighbor, rel_type in neighbors:
score = decay
if neighbor not in activated or activated[neighbor] < score:
activated[neighbor] = score
if max_depth >= 2:
hop2 = get_neighbors(neighbor)
for n2, r2 in hop2:
s2 = decay * decay
if n2 not in activated or activated[n2] < s2:
activated[n2] = s2
# map activated entities back to document chunks
results = []
for entity_name, activation in activated.items():
chunk_ids = entity_to_chunks.get(entity_name, [])
for cid in chunk_ids:
results.append({"chunk_id": cid, "graph_score": activation})
return resultsCombine vector similarity scores with graph connectivity scores using weighted fusion. A common approach is Reciprocal Rank Fusion (RRF), which ranks results by combining their rank positions from each retrieval method. An alternative is weighted score fusion, where you normalize both score types to a 0 to 1 range and compute a weighted sum. Start with a 60/40 split (60% vector, 40% graph) and tune based on your evaluation results.
def fuse_results(vector_results, graph_results, alpha=0.6):
combined = {}
# normalize vector scores to 0-1
if vector_results:
max_v = max(r["score"] for r in vector_results)
for r in vector_results:
cid = r["chunk_id"]
combined[cid] = {"vector": r["score"] / max_v, "graph": 0.0}
# add graph scores
for r in graph_results:
cid = r["chunk_id"]
if cid in combined:
combined[cid]["graph"] = r["graph_score"]
else:
combined[cid] = {"vector": 0.0, "graph": r["graph_score"]}
# weighted fusion
ranked = []
for cid, scores in combined.items():
final = alpha * scores["vector"] + (1 - alpha) * scores["graph"]
ranked.append({"chunk_id": cid, "score": final})
return sorted(ranked, key=lambda x: -x["score"])Build a test set of 50 to 100 queries that span three categories: single-topic queries (where vector search excels), multi-hop queries (where graph traversal helps most), and entity-specific queries (where exact graph lookup adds value). Run each query through vector-only retrieval and GraphRAG retrieval, and compare recall at k=5 and k=10. Expect minimal improvement on single-topic queries, 20 to 40% improvement on multi-hop queries, and significant improvement on entity-specific queries.
When GraphRAG Is Not Worth It
If your queries are almost entirely single-topic lookups against well-written documents, the graph traversal step adds complexity without proportional improvement. GraphRAG earns its complexity budget when queries involve relationships between entities, when your knowledge base has dense interconnections, and when vector search accuracy is below your quality threshold on multi-hop questions. Measure before committing to the infrastructure.
Adaptive Recall includes graph traversal as a built-in retrieval strategy. Entities are extracted automatically during memory storage, and spreading activation runs during recall. This gives you the retrieval quality of GraphRAG without building separate graph infrastructure or managing the fusion logic.
Get GraphRAG retrieval quality without the infrastructure. Adaptive Recall handles entity extraction, graph building, and traversal automatically.
Get Started Free