Is GraphRAG Better for Multi-Hop Questions
What Makes a Question Multi-Hop
A multi-hop question is one where the answer requires connecting information from two or more documents through intermediate entities. The defining characteristic is that no single document contains both the question context and the answer. Instead, the answer is reached by following a chain: document A mentions entity X, entity X is related to entity Y in document B, and entity Y contains the answer in document C.
Examples of multi-hop questions in a technical knowledge base:
- "What backup strategy protects our customer orders?" (orders -> order_service -> PostgreSQL -> backup_strategy)
- "Which team should I contact about latency in the checkout flow?" (checkout -> payments_service -> maintained_by -> payments_team)
- "What monitoring covers the database our user profiles are stored in?" (user_profiles -> users_service -> MySQL -> monitoring -> Grafana_dashboard)
- "If Redis goes down, which customer-facing features are affected?" (Redis -> depends_on_by -> session_cache -> used_by -> login_flow + shopping_cart)
Each of these questions requires following two or three relationship hops. The first entity mentioned in the question (orders, checkout, user_profiles, Redis) is not directly documented alongside the answer. A human reading the documentation would need to look up the first entity, find what it connects to, look up that connection, and repeat until reaching the answer.
Why Standard RAG Fails on Multi-Hop
Standard RAG embeds the question and finds the most similar document chunks. For "what backup strategy protects our customer orders," the vector search looks for documents similar to "backup strategy" and "customer orders." It finds documents about order management and documents about backup procedures, but these are different documents about different topics. The critical link (the order service uses PostgreSQL, and PostgreSQL's backup is WAL archiving) is in neither result because the linking document (about the order service's database configuration) is not semantically similar to either "backup strategy" or "customer orders."
Increasing the number of retrieved documents (top-k) helps sometimes. If you retrieve 20 documents instead of 5, you might get lucky and include the linking document. But this is unreliable because the linking document's semantic similarity to the query may be lower than the 20th most similar document, especially when the vocabulary gap is large. And increasing top-k adds token cost and risks the "lost in the middle" attention problem where the LLM fails to use relevant information buried among many retrieved documents.
How GraphRAG Solves It
GraphRAG extracts "customer orders" from the query, looks it up in the knowledge graph, and follows its relationships: customer_orders -> stored_by -> order_service -> database -> PostgreSQL -> backup_method -> WAL_archiving. Each hop follows an explicit, typed relationship that was extracted during graph construction. The traversal reaches the backup strategy documentation through structural connections rather than vocabulary similarity.
The key insight is that graph traversal does not care about vocabulary overlap. The connection from "customer orders" to "WAL archiving" exists in the graph because the relationships were extracted and stored. It does not matter that those two concepts share no words in common. The relationship chain bridges the vocabulary gap that makes multi-hop retrieval impossible with vector search alone.
Benchmark Results
Multiple studies confirm GraphRAG's advantage on multi-hop queries:
- Microsoft Research's 2024 GraphRAG paper showed 30 to 70% improvement in answer comprehensiveness on questions requiring synthesis across multiple documents.
- Benchmarks on the HotpotQA dataset (a multi-hop question answering benchmark) show graph-augmented retrieval improving F1 scores by 8 to 15 points over vector-only retrieval.
- Internal evaluations by teams implementing GraphRAG typically report recall improvements of 15 to 30% on multi-hop queries, with minimal change on single-hop queries.
The improvement is not uniform across all multi-hop queries. GraphRAG helps most when the hops follow explicit, typed relationships (depends_on, maintained_by, uses). It helps less when the hops are conceptual rather than structural ("this technology is similar to that technology" rather than "this service uses that technology"), because conceptual relationships are harder to extract and more likely to be missing from the graph.
Combining with Vector Search
GraphRAG does not replace vector search for multi-hop queries. It augments it. The best results come from running both retrieval methods and merging the results. Vector search catches relevant documents that the graph missed (because relationships were not extracted), and graph traversal catches connected documents that vector search missed (because vocabulary does not overlap). The fused result set covers more ground than either method alone.
Adaptive Recall implements this combination through its cognitive scoring model. Every recall operation runs vector similarity and spreading activation through the entity graph simultaneously, producing a single ranked result set that benefits from both retrieval signals. Multi-hop queries naturally benefit from graph traversal, while single-hop queries naturally rely more on vector similarity, and the scoring formula adapts without manual tuning.
Answer multi-hop questions your current RAG misses. Adaptive Recall's graph traversal follows entity connections to find the answer, however many hops away it is.
Try It Free