How to Build a Context Pipeline
The pieces of a context pipeline are the four strategies of write, select, compress, and isolate wired together into a repeatable process that runs on every request. These steps assemble them in the order that produces a working system, starting from the sources and ending with measurement.
Step 1: Define the context sources
Start by listing every source that can contribute to the window for your application: the system instructions, the current user input, the conversation history, retrieved knowledge from documents, long-term memory about the user or task, the tool definitions, and tool results. Not every application uses all of these, a stateless document bot may have only instructions, input, and retrieved knowledge, while an agent uses all of them. Writing the list explicitly is the foundation, because the pipeline's job is to manage exactly these sources and you cannot budget or assemble what you have not named.
Step 2: Set a token budget per source
Decide how much of the window each source is allowed, in tokens, and base the allocation on what your application most depends on. A support assistant might give most of its budget to retrieved policy and memory, while a coding tool gives most to retrieved files. Leave headroom rather than budgeting to the model's full limit, both for the model's own response and because windows degrade as they approach capacity. The budget is what keeps the window in the lean range that avoids context rot, and it forces the priority decisions that define your pipeline's character.
Step 3: Build the selection step
The selection step routes each request to the relevant sources, retrieves candidates, reranks them for true relevance, and filters out stale or low-confidence items, keeping only what fits each source's budget. This is the heart of the pipeline and where most quality is decided, so it deserves the most engineering attention. Build it to favor precision once candidates are gathered, including the few items that matter rather than everything that relates. The full method is in how to retrieve the right context.
Step 4: Add compression and memory
With selection in place, add the strategies that handle accumulation over time. Compression summarizes old conversation history and trims long documents so growing content stays within budget, covered in how to compress context. A memory layer handles information that must persist across sessions: it writes durable facts out at the end of an interaction and selects the relevant ones back in later, which keeps the window lean even for long-lived users. Adding memory is what turns a pipeline that only works within a single session into one that carries knowledge across them, and the role it plays is covered in whether memory is part of context engineering.
Step 5: Assemble the window in order
Assemble the selected, compressed content into the final window using a fixed structure, so the layout is consistent and testable across requests. Keep the system instructions pinned where they will not be displaced, and place the most important content where the model attends most reliably, near the start or end rather than buried in the middle, to work with rather than against the lost in the middle effect. A consistent assembly order also makes debugging far easier, because you always know where each kind of content sits in the window when you inspect a failing case.
Step 6: Measure and iterate
Finally, make the pipeline observable. Log what entered the window for each request, broken down by source, alongside the resulting answer and its quality. When an answer is wrong, this log tells you which stage failed, whether retrieval missed the needed document, compression dropped a fact, or memory returned something stale, so you can fix the specific weak stage instead of guessing. This closes the loop and turns the pipeline into something you improve with evidence. The discipline of measuring AI output quality is covered in the LLM evaluation pillar.
Build the pipeline in order: name the sources, budget them, build selection, add compression and memory, assemble in a fixed structure, and measure. The result is a system that assembles a lean, relevant window on every request and tells you which stage to fix when an answer goes wrong.
Scaling the Pipeline to Agents
The pipeline above runs once per model call, which is exactly what an agent needs many times over within a single task. For agents, the same pipeline runs every iteration of the loop, with the addition of isolation, spawning sub-agents with their own pipelines for distinct sub-tasks so no single window has to hold everything. The write strategy also does more work in agents, since intermediate results go to a scratchpad between iterations. The agent-specific patterns are covered in context engineering for AI agents, but the core pipeline is the same, which is why building it well for the single-call case pays off directly when you move to agents.
Start Simple and Add Stages As You Need Them
The six steps describe a complete pipeline, but you should not build all of it on day one. The right approach is to start with the smallest pipeline your application needs and add stages as the application grows into them. A first version might be just sources, a budget, and a basic selection step, enough for a stateless assistant. When conversations start running long, add history compression. When users should be remembered, add the memory layer. When tasks grow into agent loops, add isolation. Building incrementally keeps each addition testable and avoids the complexity of stages the application does not yet use.
What you should not defer is the budget and the observability, because both are far harder to add later than to design in from the start. A budget retrofitted onto a pipeline that grew unbounded usually means tearing apart assembly logic that assumed unlimited room, and observability added after launch means you spent the early incidents debugging blind. These two are the cheap-now, expensive-later parts of the pipeline, so even a minimal first version should set a token budget and log what entered the window. Everything else, compression, memory, isolation, can be added stage by stage as the need becomes real.
Where Pipelines Commonly Break
Knowing the failure points helps you build defensively. The most common break is an unbudgeted section that grows until it crowds out the system instructions, which makes the model start ignoring its own rules, the fix is enforcing the per-section budget from step two. The second is a selection step with good recall but no reranking, so mediocre content reaches the window, the fix is the reranking in step three. The third is compression that drops a critical fact, the fix is the protected-facts pattern from the compression stage. The fourth is a memory layer that floods the window with loosely relevant recollections, the fix is holding memory to the same budgeted, ranked selection as every other source. Each break traces to a specific stage being weak or missing, which is exactly why building the pipeline as named, observable stages, rather than as one tangled assembly function, is what makes it possible to find and fix the break when it happens.
One more structural choice pays off as the pipeline matures: make each stage independently testable. If selection, compression, memory, and assembly are separate components with clear inputs and outputs, you can write tests that check each in isolation, does selection return the expected items for a known query, does compression preserve the protected facts, does memory surface the right stored facts, does assembly respect the budget. A pipeline built as one function resists this, because there is no boundary to test against, and you are left judging only the final answer. Stage boundaries turn the pipeline into something you can verify piece by piece, which is what lets you change one stage with confidence that you have not broken another. This modular discipline is the difference between a context pipeline you can evolve safely and one that becomes too fragile to touch as the application grows. As models, retrieval methods, and memory tools keep improving, the pipeline you build today will need to absorb better components tomorrow, and only a stage-separated design lets you swap one part for a better one without rebuilding the whole, which is what keeps a context pipeline a long-lived asset rather than a rewrite waiting to happen.