Home » Context Engineering » What Is Context Engineering

What Is Context Engineering? A Practical Definition

Context engineering is the practice of deciding what information goes into a language model's context window on every call, so the model has exactly the instructions, knowledge, memory, and tool results it needs to answer well, and nothing that dilutes it. It is the system you build around a model to assemble the right context at runtime, under a fixed token budget, for each individual request. Prompt engineering is the part of this that writes the fixed instruction by hand, context engineering is the larger discipline that manages the whole window.

The Core Idea

A language model knows nothing about your request except what is in its context window at the moment it runs. It does not retain your previous conversation, it cannot see your files or your database, and it has no awareness of what your tools returned unless that information is sitting in the prompt for this call. Everything the model appears to know is there because some component of your system placed it there. Context engineering is the discipline of making that placement decision deliberately and well.

This reframes the central problem of building with language models. The model is a fixed, capable reasoning engine. What changes from request to request, and what determines whether you get a good answer, is the context you feed it. A frontier model with the right three documents in its window will answer a question correctly. The same model with the wrong documents, or with the right ones buried under fifty irrelevant ones, will answer it poorly. The engineering work that decides which case you are in is context engineering.

What Goes Into a Context Window

The window for a single model call is assembled from several distinct sources, each competing for the same limited space. The system instructions define the model's role, its rules, and its output format. The user input is the immediate request. The conversation history holds relevant earlier turns. Retrieved knowledge brings in passages from documents or a database that bear on the question. Long-term memory supplies facts the system has accumulated about the user or the task across sessions. Tool definitions tell the model what actions it can take, and tool results feed back what those actions returned. Examples or few-shot demonstrations may be included to shape the output.

Context engineering is the practice of deciding, for each request, how much of each source to include and in what form. There is never room for all of it. A long conversation cannot be included verbatim alongside a dozen retrieved documents and full tool output without blowing the budget or burying the signal. The engineering is in choosing the few thousand tokens that matter for this specific request and leaving the rest out. This is why it is a runtime activity: the relevant subset is different for every input and cannot be fixed in advance.

Key Takeaway

A context window is assembled from instructions, user input, history, retrieved knowledge, memory, and tool results. Context engineering decides how much of each to include for each request, because all of it never fits and including the wrong parts makes answers worse.

Why It Is Called Engineering

The word engineering is deliberate, because doing this well requires building systems rather than writing text. To assemble the right context at runtime you need a retrieval system that can find relevant knowledge, a memory layer that can store and recall facts across sessions, a strategy for summarizing or trimming history when it grows too long, a budget that allocates tokens across the parts of the window, and logic that decides what to include for each request. These are software components with their own design decisions, failure modes, and performance characteristics. Treating the prompt as a static string ignores all of this, which is why static prompts work in demos and break in production.

Engineering also implies measurement and iteration. A serious context pipeline is something you can test: you can measure whether retrieval surfaces the right passages, whether the assembled window stays within budget, and whether answer quality holds as conversations grow. When quality drops, you can trace it to a specific part of the pipeline, the retrieval missed, the history was summarized too aggressively, the memory returned a stale fact, and fix that part. This is the difference between context as a craft you eyeball and context as a system you operate.

The Objective: Relevance Density

The single metric that captures the goal of context engineering is relevance density, the fraction of tokens in the window that actually bear on the current request. High relevance density means almost everything in the window is helping the model answer. Low relevance density means the answer is in there somewhere, surrounded by noise that pulls the model off track. Two windows can both contain the correct answer and produce very different results, because the one with higher relevance density makes the answer easy for the model to use while the other buries it.

Everything in context engineering serves this objective. Retrieval raises relevance density by bringing in the passages that match the request and excluding the rest. Summarizing a long history raises it by replacing many low-value tokens with a few high-value ones. Splitting work across focused sub-agents raises it by keeping each window small and on-task. A memory layer raises it by recalling the specific facts that matter rather than replaying an entire history. When you understand that the goal is to maximize the share of the window that matters, the individual techniques stop looking like a grab bag and start looking like one optimization pursued several ways.

How It Relates to RAG, Memory, and Prompts

Context engineering is the umbrella, and several familiar techniques are parts of it. Prompt engineering writes the fixed instruction, which is one block of the window. Retrieval-augmented generation, or RAG, is the selection of relevant documents into the window, which is one source of context. A memory system handles the information that persists across sessions, which is another source. None of these is the whole picture on its own, and a system that does one well while ignoring the others still fails. RAG with a badly managed conversation history runs out of window. A great prompt with weak retrieval answers from missing information. Context engineering is the practice of making all the parts work together within the budget. For the relationship to retrieval specifically, see context engineering versus RAG, and for the relationship to memory see whether memory is part of context engineering.

Key Takeaway

Prompt engineering, RAG, and memory are all parts of context engineering, not alternatives to it. The discipline is making every part work together to maximize relevance density within a fixed token budget on each request.

When a Project Needs Context Engineering

Not every use of a language model demands a full context pipeline, and recognizing where the line sits prevents both under-building and over-building. A single, self-contained request with no external data and no memory, summarize this text I am pasting in, needs almost no context engineering, because everything required is already in the input. The moment a system has to bring in information the model does not already have, the discipline starts to matter, and how much it matters scales with how much that information grows and changes.

The clearest triggers are statefulness and external knowledge. If your application answers questions about your own documents, you need selection, which is retrieval. If it holds a conversation across many turns, you need history management. If it should remember a user between visits, you need a memory layer. If it runs as an agent over many steps, you need all four strategies working together. Each of these is a reason the window can no longer be a fixed string, and the more of them apply, the more the system's success depends on context engineering rather than on the model or the prompt wording.

The cost of recognizing this late is high, because context handling is hard to retrofit. A system built as a static prompt that later needs memory and history management often has to be re-architected around the window rather than patched, since the assembly logic, the budget, and the observability all have to be added at once. Designing for the window from the start, even minimally, leaves room to add selection, compression, and memory as the application grows, which is far cheaper than bolting them on after the first failures with real users expose the gap.

Key Takeaway

The need for context engineering scales with statefulness and external knowledge. Stateless single requests need little, but any system that retrieves, remembers, holds long conversations, or runs as an agent depends on it, and designing for the window early is far cheaper than retrofitting it.