Home » Context Engineering » Context Engineering for AI Agents

Context Engineering for AI Agents

An agent runs a loop of reading its context, choosing an action, calling a tool, and feeding the result back into context for the next step, which means its window grows with every iteration. Context engineering is what keeps an agent working across a long task, because without active management the window either hits its limit or degrades into context rot before the task is done. The four strategies of write, select, compress, and isolate become non-negotiable for agents in a way they are not for single-turn applications.

Why Agents Make Context Harder

A single-turn application assembles a window once and gets a response. An agent assembles a window, acts, observes, and assembles again, dozens of times within one task. Each tool call returns output that gets appended to the context, each reasoning step adds tokens, and the running history of what the agent has already done accumulates. Left unmanaged, this growth is monotonic: the window only gets bigger, until it overflows the model's limit or fills with so much accumulated state that the agent loses the thread.

This is the defining context problem of agents. A web-research agent that reads twenty pages, a coding agent that explores a large repository, or a customer-service agent that works through a multi-step resolution all generate far more information over their run than fits in a single window. The agent's success depends less on the model's raw capability and more on whether the system keeps the active window focused on what the current step needs while preserving access to what earlier steps produced. The problem of agents losing tasks is usually a context management failure at root.

Write: Scratchpads and Persistent Memory

The first defense is to write information out of the active window. During a task, an agent maintains a scratchpad: it records its plan, intermediate findings, and decisions to an external store, then keeps only a pointer or a short summary in the window. When it needs a detail later, it reads that specific item back rather than carrying everything forward in context the whole time. A research agent writes each source's key findings to the scratchpad and drops the raw page text. A coding agent writes its understanding of the codebase structure once and refers back to it instead of re-reading files.

Across tasks, the agent writes durable facts to long-term memory. What it learned about a system, a user's stated preferences, the outcome of a previous run, all of this persists so a future task starts informed rather than blank. This is the write principle operating at two timescales, within a task via the scratchpad and across tasks via memory, and it is what lets an agent operate over far more information than a single window holds. The memory for AI agents pillar covers the persistence layer in depth.

Key Takeaway

Agents must write information out of the active window, to a scratchpad within a task and to persistent memory across tasks, so the working context stays small while the agent retains access to everything it has produced.

Compress: Summarize Completed Work

As an agent finishes sub-tasks, the full record of how it completed them stops being useful, while the result remains essential. Compression captures this by summarizing completed work into its outcome before moving on. An agent that spent ten steps figuring out a database schema does not need those ten steps in its window once it has the schema, it needs the schema. Replacing the ten steps with a one-paragraph summary frees most of the budget they consumed while keeping the conclusion.

A common pattern is to compress at natural boundaries: when a sub-task completes, when the window crosses a size threshold, or when the agent transitions between phases of a task. The risk is summarizing away a detail that a later step needs, so the safe practice is to preserve concrete facts, identifiers, decisions, and any state the rest of the task depends on, while compressing the narrative of how they were reached. The general technique is covered in the guide to compressing context.

Isolate: Sub-Agents With Their Own Windows

The most powerful agent context technique is isolation through sub-agents. Instead of one agent carrying the entire task in a single growing window, a coordinator spawns sub-agents that each handle a distinct sub-task in their own clean window, then return only their result. A research coordinator might dispatch one sub-agent per source, each reading its source in isolation and returning a summary, while the coordinator's window holds only the summaries and never the raw sources. Each window stays small and focused, which keeps relevance density high everywhere.

Isolation is what lets agent systems scale past what any single window could hold, because the total work is partitioned across many bounded contexts rather than concentrated in one. The cost is coordination: the sub-agents' results have to be collected and combined, which is itself managed by writing those results to a shared store the coordinator selects from. This is why isolation and writing go together in practice. The trade-offs of sharing context across agents are covered in sharing memory between agents.

Select: Keep Only What the Step Needs

At each iteration, the agent selects what belongs in the window for the current step. Not every tool needs to be defined on every call, not every prior result needs to stay in context, and not every memory is relevant to the current action. Selecting tightly, the few tool definitions this step might use, the specific prior results it depends on, the memories that bear on the current decision, keeps each iteration's window focused. As the number of available tools grows, selecting the relevant subset rather than exposing all of them becomes its own meaningful win, both for relevance density and for the model's accuracy in choosing among them.

Putting It Together in the Loop

A well-engineered agent loop applies all four strategies every iteration. It selects the context the current step needs, acts, writes durable results to the scratchpad or memory, compresses completed sub-tasks into their outcomes, and isolates large sub-tasks into sub-agents with their own windows. The active window oscillates within a bounded range instead of growing without limit, so the agent performs as well on step fifty as on step five. This bounded-window discipline is the practical core of building agents that complete long tasks reliably, and it is the reason context engineering is treated as the foundation of agent reliability rather than an optimization. To assemble these pieces into a working system, see how to build a context pipeline.

Key Takeaway

A reliable agent keeps its active window in a bounded range by applying all four strategies every iteration: select what the step needs, write results out, compress finished work, and isolate large sub-tasks into sub-agents. Bounded windows are what let agents finish long tasks.

The Signs of an Agent With a Context Problem

Agent context failures have recognizable symptoms, and learning to read them tells you which strategy to reach for. An agent that performs well early in a task and degrades as it proceeds is suffering context rot from an unmanaged growing window, which calls for compression and writing. An agent that repeats actions it already completed has lost the record of what it did, a sign that completed work is being compressed away without preserving its outcome, or that the scratchpad is not being read back. An agent that contradicts an earlier decision is holding conflicting information in its window, a confusion failure that better selection and protected facts address.

An agent that simply stops making progress on a long task, looping or stalling, has often filled its window to the point where it can no longer reason clearly about the next step. This is the clearest case for isolation: the task is too large for one context and needs to be partitioned across sub-agents. Reading these symptoms as context problems rather than as the model being not smart enough is the shift that makes agents debuggable, because each symptom points to a specific strategy that is missing or misapplied rather than to an unfixable limitation of the model.

Single Agent or Multi-Agent?

A recurring design question is whether to keep an agent as one loop or split it into a coordinator with sub-agents, and context is the deciding factor. A single agent is simpler and keeps all state in one place, which is fine while the task fits comfortably in a bounded window with compression and writing. The moment a task is large or naturally decomposes into independent sub-tasks, the single window becomes the bottleneck, and isolation through sub-agents is what scales past it, since each sub-agent works in its own clean context and returns only its result.

The trade-off is coordination cost. Multi-agent systems have to pass results between contexts, which means deciding what each sub-agent returns and how the coordinator combines those returns, often through a shared store that the coordinator selects from. This coordination is itself a context engineering problem, and it adds complexity that a single agent avoids. The practical rule is to stay single-agent until the window genuinely cannot hold the task even with compression, then isolate, because isolation solves the window ceiling at the price of coordination, and you only want to pay that price when the ceiling is the actual constraint. The shared-state side of this is covered in sharing memory between agents.