Home » Context Engineering » How to Retrieve the Right Context

How to Retrieve the Right Context for Each Request

Retrieving the right context is the selection step of context engineering, where most answer quality is decided. The reliable method is to first understand what the request actually needs, retrieve a broad candidate set with both semantic and keyword search, rerank those candidates for true relevance, filter out stale or low-confidence items, and finally include only the top results that fit the token budget. The goal at every step is precision: bring in the few items that matter and exclude the many that merely relate.

Selection is the highest-leverage part of a context pipeline because the difference between a grounded answer and a diluted one is usually which items ended up in the window. Including the right three passages and excluding the irrelevant fifty produces a correct answer, while including all fifty-three produces a worse one even though the answer is present. These steps build a selection process that consistently favors precision over recall once the candidates are in hand.

Step 1: Understand what the request actually needs

Before retrieving anything, determine what kind of information the request requires. A factual question needs supporting passages from the knowledge base, a question about the user needs facts from memory, a coding request needs the relevant files and symbols. Many requests need more than one source, and some need none at all, a simple greeting requires no retrieval. Routing the request to the right sources first prevents the common waste of running every retriever on every request and stuffing all the results into the window. For ambiguous or multi-part requests, it can help to decompose the request into the distinct information needs it implies and retrieve for each.

Step 2: Retrieve a broad candidate set

For each information need, retrieve more candidates than will fit in the window, so the right items are present in the pool to choose from. Combine semantic search, which finds conceptually related content, with keyword search, which catches exact terms, identifiers, and rare words that embeddings miss. This hybrid approach has higher recall than either alone, and high recall at this stage matters because an item that is never retrieved can never be selected. Do not try to be precise yet, the job here is to make sure the right candidates are in the set, and precision comes next. The mechanics of semantic retrieval are covered in the vector search pillar.

Step 3: Rerank candidates for true relevance

Raw retrieval scores, especially vector similarity, are a rough proxy for relevance, and the top item by similarity is often not the most useful for the specific request. A reranking step reorders the candidate set by true relevance to the request, typically with a model that scores each candidate against the full query rather than just embedding distance. Reranking is where precision is won: it pushes the genuinely useful items to the top so that when you cut to fit the budget, you keep the right ones. Skipping this step is the most common reason a pipeline with good retrieval still puts mediocre context in the window.

Step 4: Filter for freshness and reliability

Relevance is not the only axis that matters. A highly relevant passage that is out of date, or a memory that has since been contradicted, can be worse than nothing because the model will trust it. Filter the reranked candidates to drop stale, superseded, or low-confidence items before they reach the window. This is straightforward for documents with timestamps and essential for accumulated memory, where facts change and old ones get overturned. A memory layer that scores facts by confidence, raising it under corroboration and lowering it under contradiction, makes this filter automatic, since the select step can simply prefer high-confidence memories. This is part of why memory is part of context engineering.

Step 5: Include only what fits the budget

Finally, add items to the window in priority order until the section's token budget is reached, then stop. This is where you commit to precision over recall: it is better to include the top three highly relevant, current items than to cram in ten and dilute them. Order matters too, because of the lost in the middle effect, so place the most important items where the model attends most, typically near the start or end of the section rather than buried in the middle. The result is a dense, current, well-ordered context that gives the model exactly what the request needs.

Key Takeaway

Retrieve broadly for recall, then rerank and filter hard for precision, and include only the top items that fit the budget. High recall ensures the right items are available, reranking and filtering ensure the right ones are chosen, and the budget cut ensures the window stays dense rather than diluted.

Why Selection Beats a Bigger Window

It is tempting to skip careful selection and rely on a large context window to hold everything plausibly relevant. This fails for the reasons covered in context rot: a window full of loosely relevant content produces worse answers than a lean window of precisely selected content, because attention spreads thin and the relevant items get diluted and underweighted. Good selection is not a workaround for small windows, it is the right approach even with large ones, because relevance density, not raw capacity, is what determines answer quality. Investing in retrieval, reranking, and filtering pays off regardless of how much the window could technically hold.

Selecting From Memory, Not Just Documents

The five steps apply to any source the window draws from, and the source that most rewards careful selection is accumulated memory. Selecting from memory is harder than selecting from documents in one specific way: documents are static and authoritative, while memories change, can be superseded, and vary in how well-supported they are. A naive memory query that returns the most semantically similar stored facts will sometimes surface a fact that was later contradicted or that is simply stale, and because the model trusts what is in the window, a wrong memory is worse than a missing one. The freshness-and-reliability filter in step four is therefore not optional for memory, it is central.

This is where a memory layer with confidence scoring changes the selection step materially. When each stored fact carries a confidence that rises as the fact is independently corroborated and falls when it is contradicted, the select step can rank by relevance and confidence together, preferring well-supported memories and demoting doubtful ones automatically. Adaptive Recall is built around exactly this, so selecting from memory becomes a query that returns the relevant, reliable facts within the budget rather than a raw similarity search you have to second-guess. Treating memory selection with the same five-step rigor as document retrieval, with reliability filtering doing extra work, is what keeps the persistent half of the window trustworthy, as covered in whether memory is part of context engineering.

Common Selection Mistakes

A few selection mistakes recur across systems. The first is retrieving too little for recall, using a small candidate set so the right item is never even a candidate, which no amount of reranking can fix because reranking can only reorder what it was given. The second is the opposite at the inclusion stage, including too many items because they all seem somewhat relevant, which dilutes the window and triggers the very context rot that careful selection is meant to avoid. The third is skipping reranking and trusting raw similarity, which puts mediocre items at the top and good ones just below the cut. The fourth is ignoring reliability, so a relevant but stale or contradicted item enters the window and misleads the model. Each of these maps to one of the five steps being done weakly, which is why following the steps in order, broad recall then precise reranking, filtering, and a tight budget cut, is what produces a selection process that holds up in production.

The reason this five-step order matters, rather than collapsing it into a single retrieve-and-include step, is that recall and precision are in tension and need different stages to serve each. The early steps optimize for recall, casting a wide net so the right items are guaranteed to be in the pool, because anything missed here is lost forever. The later steps optimize for precision, narrowing hard so only the best items survive into the window, because anything diluting the window costs answer quality. Trying to do both at once, retrieving exactly the few items you will include, forces a single query to be both broad and narrow, which it cannot be. Separating the broad gather from the precise filter is what lets each be aggressive in its own direction, and it is the structural reason a staged selection process outperforms a one-shot one. Building this as distinct, measurable stages also means you can see whether a failure was a recall miss or a precision miss, and fix the right one.