Context Engineering for Codex CLI in June 2026: The Write-Select-Compress-Isolate Playbook

Andrej Karpathy put it concisely: context engineering is “the delicate art and science of filling the context window with just the right information for the next step”¹. By mid-2026, the term has moved from a Twitter coinage to an industry-wide discipline, and for good reason. Sourcegraph’s practical guide to context engineering found that without structured retrieval, coding agents routinely miss files and facts outside their immediate window — what they call the “80% problem”, where the agent completes the visible portion of a task and silently drops everything its context did not cover². That is not a prompt-wording issue — it is an information-supply problem driven entirely by what the model sees.

This article maps the four-strategy framework — Write, Select, Compress, Isolate — popularised by LangChain’s context engineering repository³ and reinforced by Sourcegraph’s practical guide⁴ onto the concrete mechanisms Codex CLI exposes as of v0.139. If you have read the earlier context engineering primer in this knowledge base (April 2026), treat this as its successor: the tooling has matured, the patterns have sharpened, and several new capabilities have shipped.

The Shift: Prompt Engineering → Context Engineering → Harness Engineering

The industry trajectory is now clear. Between 2022 and 2026, the dominant paradigm shifted three times⁵:

Prompt engineering (2022–2024): optimise the phrasing of a single request.
Context engineering (2025–2026): curate the entire information environment — instructions, documents, tool definitions, conversation history, memory — that surrounds every inference call.
Harness engineering (emerging): design the scaffolding that orchestrates context across multi-agent systems, including approval policies, sandbox boundaries, and event-driven hooks.

Codex CLI sits at the intersection of the second and third. Its harness — the Rust binary that manages the agent loop — handles context assembly automatically, but every configuration knob you turn changes what lands in the window and when.

graph LR
    A["Prompt Engineering<br/>2022-2024"] --> B["Context Engineering<br/>2025-2026"]
    B --> C["Harness Engineering<br/>2026+"]
    B --> D["Write"]
    B --> E["Select"]
    B --> F["Compress"]
    B --> G["Isolate"]
    D --> H["AGENTS.md<br/>Memories<br/>Skills"]
    E --> I["Tool Search<br/>MCP Servers<br/>Web Search"]
    F --> J["Compaction<br/>Schema Pruning<br/>Token Budgets"]
    G --> K["Subagents<br/>Profiles<br/>Sandboxes"]

Strategy 1: Write — Persist Context Outside the Window

Writing context means saving durable knowledge outside the live context window so the agent can draw on it later³. In Codex CLI, three mechanisms serve this purpose.

AGENTS.md Hierarchy

Codex discovers instruction files in a strict order: global (~/.codex/AGENTS.md), then repository root down to the current working directory, concatenating as it goes⁶. Files closer to the working directory appear later in the merged prompt, giving them effective override priority.

# ~/.codex/config.toml — tune the byte budget
project_doc_max_bytes = 65536          # default 32768
project_doc_fallback_filenames = ["TEAM_GUIDE.md", ".agents.md"]

The practical lesson from 2026: AGENTS.md is operational policy, not documentation. The model does not need to understand why you use conventional commits — it needs the exact command and what “done” looks like⁷. Teams that treat AGENTS.md as a README consistently underperform teams that treat it as a runbook.

Memories

Codex’s native memory system (GA since v0.128) extracts durable insights from completed sessions and injects them into future ones⁸. After a session idles, a two-phase pipeline runs: per-session extraction (with secret redaction), then global consolidation into ~/.codex/memories/memory_summary.md.

# ~/.codex/config.toml
[features]
memories = true

# Fine-grained controls
[memories]
generate_memories = true
use_memories = true
disable_on_external_context = true     # skip MCP/web-search threads
min_rate_limit_remaining_percent = 20  # back off when rate-limited

As of v0.139, external tool output is excluded from memory storage⁹, which prevents MCP responses from polluting the memory corpus — a common problem in earlier versions.

Skills and PLANS.md

Skills are reusable prompt fragments that inject specialised instructions when activated¹⁰. The execplan pattern — writing a PLANS.md file at session start and updating it as work progresses — gives the agent a persistent reference document that survives compaction¹¹. This is context writing in its purest form: the agent produces a file that it later reads to maintain coherence across a multi-hour session.

Strategy 2: Select — Pull the Right Tokens In

Selecting context means retrieving information into the window precisely when needed³.

Tool Search

Since v0.119, Codex CLI ships a built-in tool search that uses semantic matching to surface relevant tools from the full catalogue, reducing tool-definition overhead by 85–98%¹². Without it, a 43-tool GitHub MCP server injects over 28,000 tokens of tool definitions into every turn¹³.

MCP Server Scoping

The enabled_tools and disabled_tools arrays in config.toml act as allow/deny lists that determine which MCP tool definitions enter the context window at all¹⁴:

[mcp_servers.github]
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
enabled_tools = ["create_pull_request", "get_file_contents", "search_code"]

This is context selection at the configuration layer — you decide which tools the model can see before it ever starts reasoning.

Code-Mode Web Search

v0.139 added standalone web search in code mode, including from nested JavaScript tool calls¹⁵. The agent can now pull in external documentation mid-session without leaving the coding flow. Combined with parallel execution, this means the agent selects and fetches context from the web in the background while continuing to write code.

Strategy 3: Compress — Retain Only What Matters

Compressing context means reducing token usage while preserving the information the agent needs³.

Automatic Compaction

When the conversation approaches the context window limit, Codex CLI triggers automatic compaction — summarising earlier turns into a condensed representation¹⁶. The compaction algorithm preserves the most recent messages verbatim and summarises older ones. You can track compaction events via telemetry (task.compact metric with remote and local types).

Schema Compaction

v0.139 introduced smarter schema handling: tool and connector input schemas now preserve oneOf and allOf structures, and large schemas “keep more shallow structure when compacted”¹⁵. This matters because MCP tool definitions can be enormous — a Prisma MCP server exposing migration tools sends multi-kilobyte JSON schemas that previously lost structural information during compaction.

Token Budget Discipline

Goal mode (GA since May 2026) treats token budget exhaustion as a soft stop, injecting wrap-up steering rather than aborting¹⁷. This is compression at the session level: the agent learns to produce a summary and handoff notes when the budget runs low rather than simply stopping mid-thought.

flowchart TD
    A["Full Context Window"] -->|"Approaching limit"| B{"Compaction Trigger"}
    B -->|"Conversation"| C["Summarise older turns"]
    B -->|"Tool schemas"| D["Prune to shallow structure"]
    B -->|"Token budget"| E["Wrap-up steering"]
    C --> F["Compacted Window"]
    D --> F
    E --> G["Goal: budget_limited"]

Strategy 4: Isolate — Split Context Across Boundaries

Isolating context means distributing work across separate context windows so that no single window becomes saturated³.

Subagents

Codex CLI’s multi-agent v2 runtime spawns subagents with their own context windows, each scoped to a specific role or subtask¹⁸. A subagent inherits the parent’s AGENTS.md instructions but operates within its own conversation history and tool set. This is the most direct implementation of context isolation: a frontend subagent never sees the backend subagent’s database schemas, and vice versa.

# .codex/agents/frontend-reviewer.toml
model = "o4-mini"
instructions = """
You review React components for accessibility and performance.
You do not modify backend code.
"""

[sandbox]
writable_roots = ["src/components", "src/hooks"]

Permission Profiles

Profiles scope the agent’s capabilities per task, which implicitly scopes what context it encounters¹⁹. A debug profile with elevated sandbox permissions sees different system resources than a review profile restricted to read-only file access. The profile shapes what tools are available, which shapes what tool definitions enter the context window, which shapes what the model reasons about.

OS-Native Sandboxes

Seatbelt (macOS), Bubblewrap (Linux), and Landlock (Linux 5.13+) provide hard isolation boundaries that prevent the agent from accessing files outside its permitted scope²⁰. This is context isolation at the operating system level — the agent physically cannot read context it should not have.

The Compounding Effect

These four strategies are not independent. They compound:

Combination	Effect
Write + Select	AGENTS.md instructs the agent to use tool search before calling MCP tools, keeping the window lean
Write + Compress	PLANS.md survives compaction because the agent re-reads the file rather than relying on conversation history
Select + Isolate	Each subagent has its own `enabled_tools` list, so tool definitions are both selected and isolated
Compress + Isolate	Subagent compaction happens independently; one subagent’s compaction does not discard another’s context

The measurable outcome: teams that implement all four strategies report that their agents maintain coherence across sessions exceeding 100,000 tokens of cumulative input, whereas single-window, unmanaged-context sessions typically degrade after 30,000–40,000 tokens²¹.

A Minimal Context Engineering Checklist

For teams adopting Codex CLI for the first time or upgrading from ad-hoc prompting:

Write an AGENTS.md at repository root with exact commands, done criteria, and test expectations⁷.
Enable memories in config.toml so cross-session insights accumulate automatically⁸.
Scope MCP tools with enabled_tools — never expose the full catalogue¹⁴.
Use subagents for tasks that span more than two domains (frontend + backend, code + infrastructure)¹⁸.
Monitor compaction via OpenTelemetry traces to detect when sessions are hitting the window ceiling too early¹⁶.
Write a PLANS.md at the start of any multi-file task and instruct the agent to update it after each milestone¹¹.

What Comes Next

The discipline is still evolving. Harness engineering — the third phase in the trajectory — is beginning to formalise. Martin Fowler’s harness-engineering framework²² and the “Inside the Scaffold” taxonomy²³ both argue that the scaffolding around the model (approval policies, event hooks, sandbox configuration) matters more than either the prompt or the raw context. Codex CLI’s architecture already supports this: PostToolUse hooks, execution policy rules via Starlark, and cloud-managed configuration bundles are all harness-layer constructs that shape context indirectly.

For now, the Write-Select-Compress-Isolate framework gives Codex CLI developers a practical, testable vocabulary for the decisions they make every day. The context window is not infinite. Engineering what goes into it is the highest-leverage activity in agentic development.

Citations

Andrej Karpathy on X, “+1 for ‘context engineering’ over ‘prompt engineering’”, https://x.com/karpathy/status/1937902205765607626 ↩
Sourcegraph, “Context Engineering: A Practical Guide for AI Agents (2026)”, https://sourcegraph.com/blog/context-engineering ↩
LangChain, “Context Engineering Strategies”, GitHub repository, https://github.com/langchain-ai/context_engineering ↩ ↩² ↩³ ↩⁴ ↩⁵
Sourcegraph, “Agentic Coding in 2026: A Practical Guide for Big Code”, https://sourcegraph.com/blog/agentic-coding ↩
“From Prompts to Harnesses — Four Years of AI Agentic Patterns”, https://bits-bytes-nn.github.io/insights/agentic-ai/2026/04/05/evolution-of-ai-agentic-patterns-en.html ↩
OpenAI, “Custom instructions with AGENTS.md”, https://developers.openai.com/codex/guides/agents-md ↩
Blake Crosley, “AGENTS.md Patterns: What Actually Changes Agent Behavior”, https://blakecrosley.com/blog/agents-md-patterns ↩ ↩²
OpenAI, “Memories – Codex”, https://developers.openai.com/codex/memories ↩ ↩²
OpenAI, “Codex CLI Changelog — v0.139.0”, https://developers.openai.com/codex/changelog ↩
OpenAI, “Plugins – Codex”, https://developers.openai.com/codex/plugins ↩
OpenAI, “Run long horizon tasks with Codex”, https://developers.openai.com/blog/run-long-horizon-tasks-with-codex ↩ ↩²
Scalekit, MCP token cost benchmarks, cited in “The MCP Tax: When Shell Commands Beat MCP Servers”, Codex Knowledge Base, 2026-06-09 ↩
Scalekit, MCP server token analysis, cited in “The MCP Tax”, Codex Knowledge Base, 2026-06-09 ↩
OpenAI, “Advanced Configuration – Codex”, https://developers.openai.com/codex/config-advanced ↩ ↩²
OpenAI, “Codex CLI Changelog — v0.139.0”, https://developers.openai.com/codex/changelog ↩ ↩²
OpenAI, “Features – Codex CLI”, https://developers.openai.com/codex/cli/features ↩ ↩²
OpenAI, “Run long horizon tasks with Codex”, https://developers.openai.com/blog/run-long-horizon-tasks-with-codex ↩
OpenAI, “Subagents – Codex”, https://developers.openai.com/codex/subagents ↩ ↩²
OpenAI, “Config basics – Codex”, https://developers.openai.com/codex/config-basic ↩
OpenAI, “CLI – Codex”, https://developers.openai.com/codex/cli ↩
Bind AI, “Context Engineering 2026: Complete Developer Guide”, https://blog.getbind.co/context-engineering-in-2026-the-complete-developers-guide/ ↩
Martin Fowler, harness-engineering framework, cited in “From Prompts to Harnesses — Four Years of AI Agentic Patterns” ↩
“Inside the Scaffold: What Academic Research Reveals About Codex CLI’s Agent Architecture”, Codex Knowledge Base, 2026-06-08 ↩