14 Types of RAG: From Naive to Agentic
Generation 1: Basic Retrieval
1. Naive RAG
Embed document chunks into vectors, retrieve by cosine similarity, stuff into LLM prompt. The tutorial pattern. Works for simple, direct lookups where the query vocabulary matches the document vocabulary. Fails on complex queries, vocabulary mismatches, and fragmented answers. Every other RAG type exists because naive RAG fails in production.
2. Hybrid RAG
Combines vector similarity search with BM25 keyword matching. Results from both retrieval methods are merged using reciprocal rank fusion. This catches queries where exact terms matter (product names, error codes, configuration keys) that vector search handles poorly. The most impactful single upgrade from naive RAG, improving recall by 10 to 15% with minimal additional complexity.
3. Reranked RAG
Adds a cross-encoder reranking step after initial retrieval. The reranker scores each query-chunk pair by how well the chunk answers the specific question, not just how similar it is. This fixes rank inversion where topically similar but unhelpful chunks outrank specific, useful chunks. Improves precision by 15 to 25% on top of hybrid search. The combination of hybrid search plus reranking is the current production baseline.
Generation 2: Structured Retrieval
4. Filtered RAG
Applies metadata filters before similarity scoring. Date ranges, document types, access permissions, data sources, and custom tags constrain the search space so retrieval only considers relevant content. Essential for multi-tenant applications, compliance-sensitive environments, and any system where not all content is relevant to all users.
5. Parent-Child RAG
Creates hierarchical chunk relationships. Small chunks are embedded for precise retrieval, but when a small chunk matches, its parent chunk (the broader section or document) is returned as context. This preserves the precision of small-chunk retrieval while providing enough context for the LLM to generate a complete answer. Fixes the fragmentation problem where important context is split across chunks.
6. Recursive RAG
Uses initial retrieval results to generate follow-up queries. If the first retrieval finds partial information, the system extracts entities or topics from the results and searches again with refined queries. This iterates until the retrieval finds sufficient information or reaches a maximum iteration count. A stepping stone toward agentic RAG that addresses fragmentation without full agent orchestration.
Generation 3: Graph-Augmented Retrieval
7. GraphRAG (Entity-Centric)
Augments vector retrieval with knowledge graph traversal. Entities are extracted from the query, looked up in the graph, and their relationships traversed to find connected content. The graph-retrieved content is merged with vector-retrieved content. This finds information connected through entity relationships even when vocabulary overlap is zero, addressing multi-hop queries that vector search cannot handle.
8. GraphRAG (Community-Based)
Pre-computes community summaries from densely connected entity clusters in the knowledge graph. At query time, relevant community summaries are retrieved alongside vector results. This is Microsoft Research's original GraphRAG approach, designed for broad, open-ended questions that require thematic overviews rather than specific entity lookups. Better for "describe the architecture" questions, worse for "what version of X are we using" questions.
9. Hybrid Graph-Vector RAG
Runs graph traversal and vector search in parallel, using spreading activation to weight graph results by connection strength and combining them with vector results using learned weights. This is more sophisticated than simple GraphRAG because the graph traversal uses cognitive models (like ACT-R spreading activation) rather than uniform breadth-first search, producing better rankings when the graph is dense.
Generation 4: Agentic Retrieval
10. Routed RAG
An LLM classifier routes each query to the most appropriate retrieval strategy or data source. Factual lookups go to the knowledge base, code questions go to the repo index, real-time questions go to live APIs, and analytical questions go to the SQL database. Routing prevents the waste of searching all sources for every query and enables the system to use specialized retrieval for each query type.
11. Decomposed RAG
Complex queries are broken into independent sub-questions, each retrieved separately. Results from all sub-questions are synthesized into a final answer. This is the core agentic RAG pattern that fixes fragmentation on multi-part questions by targeting each piece of information independently rather than hoping a single broad query finds everything.
12. Self-Evaluating RAG
After retrieval, the agent evaluates whether the results are sufficient to answer the question. If not, it generates follow-up queries targeting specific gaps and retrieves again. This loop continues until the agent is satisfied or reaches a budget limit. Self-evaluation prevents the system from generating answers from insufficient context, which is the source of many hallucination and incompleteness errors.
Generation 5: Learning Systems
13. Feedback-Augmented RAG
User feedback (thumbs up/down, answer acceptance, rephrasing) adjusts retrieval scores over time. Chunks that consistently contribute to accepted answers get boosted. Chunks that consistently appear in rejected answers get demoted. The system improves its retrieval quality from production traffic without manual intervention. This is the simplest form of retrieval learning.
14. Memory-Augmented RAG
Replaces the static chunk index with a dynamic memory system that stores, retrieves, consolidates, and forgets information over time. Memories carry metadata beyond text content: confidence scores, entity connections, access history, corroboration status, and lifecycle state. Retrieval uses cognitive scoring (recency, frequency, spreading activation, confidence weighting) rather than pure similarity. The system learns not just which chunks are useful but which knowledge is reliable, current, and interconnected.
Memory-augmented RAG is the most capable type because it addresses all failure modes of the types below it: vocabulary gap (spreading activation through entity connections), fragmentation (knowledge graph traversal), rank inversion (cognitive scoring with multiple factors), staleness (memory lifecycle with decay and consolidation), and static behavior (continuous learning from usage). The trade-off is system complexity, but this complexity can be managed by a memory platform rather than built from scratch.
Choosing Your Architecture
Start with the simplest type that meets your accuracy requirements and add complexity only when you measure a need. For most applications, the progression is: start with naive RAG to validate the use case, upgrade to hybrid plus reranking when you see production accuracy problems, add GraphRAG when multi-hop queries are a significant portion of traffic, add agentic retrieval when complex queries need iterative reasoning, and adopt memory-augmented retrieval when you need the system to learn and improve over time.
Adaptive Recall implements Type 14 (memory-augmented RAG) as a managed service. It combines vector retrieval, cognitive scoring (ACT-R activation modeling), knowledge graph traversal (spreading activation), memory lifecycle management (consolidation, decay, forgetting), and evidence-gated learning into a single recall operation. You get the retrieval quality of the most advanced RAG type without building the infrastructure for types 7 through 14.
Jump to Type 14. Adaptive Recall gives you memory-augmented retrieval with cognitive scoring, graph traversal, and lifecycle management, out of the box.
Get Started Free