<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://codex.danielvaughan.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://codex.danielvaughan.com/" rel="alternate" type="text/html" /><updated>2026-04-07T19:03:31+01:00</updated><id>https://codex.danielvaughan.com/feed.xml</id><title type="html">Codex Blog</title><subtitle>Articles on agentic software engineering with Codex CLI</subtitle><author><name>Daniel Vaughan</name></author><entry><title type="html">AGENTS.md as an Open Standard: Cross-Tool Portability Under Linux Foundation Governance</title><link href="https://codex.danielvaughan.com/2026/04/07/agents-md-open-standard-cross-tool-portability/" rel="alternate" type="text/html" title="AGENTS.md as an Open Standard: Cross-Tool Portability Under Linux Foundation Governance" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/agents-md-open-standard-cross-tool-portability</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/agents-md-open-standard-cross-tool-portability/"><![CDATA[<h1 id="agentsmd-as-an-open-standard-cross-tool-portability-under-linux-foundation-governance">AGENTS.md as an Open Standard: Cross-Tool Portability Under Linux Foundation Governance</h1>

<hr />

<p>The AGENTS.md file that sits in your repository root has quietly become the most consequential configuration standard in agentic coding. What began as an OpenAI-originated convention for guiding Codex CLI is now a Linux Foundation project supported by over 25 tools and adopted by more than 60,000 open-source repositories<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. If you are maintaining separate instruction files for each AI coding tool, you are doing unnecessary work. Here is what has changed and how to consolidate.</p>

<h2 id="the-fragmentation-problem">The Fragmentation Problem</h2>

<p>By mid-2025, every major AI coding tool had invented its own instruction format:</p>

<ul>
  <li><strong>Codex CLI</strong>: <code class="language-plaintext highlighter-rouge">AGENTS.md</code></li>
  <li><strong>Claude Code</strong>: <code class="language-plaintext highlighter-rouge">CLAUDE.md</code></li>
  <li><strong>Cursor</strong>: <code class="language-plaintext highlighter-rouge">.cursorrules</code> and <code class="language-plaintext highlighter-rouge">.cursor/rules/</code></li>
  <li><strong>GitHub Copilot</strong>: <code class="language-plaintext highlighter-rouge">.github/copilot-instructions.md</code></li>
  <li><strong>Gemini CLI</strong>: <code class="language-plaintext highlighter-rouge">GEMINI.md</code></li>
  <li><strong>Windsurf</strong>: <code class="language-plaintext highlighter-rouge">.windsurfrules</code></li>
</ul>

<p>Teams using more than one tool — which is most teams — ended up maintaining multiple files with 80% overlapping content<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Worse, instructions would drift between files, producing inconsistent agent behaviour across tools.</p>

<h2 id="the-agentic-ai-foundation">The Agentic AI Foundation</h2>

<p>On 9 December 2025, the Linux Foundation announced the Agentic AI Foundation (AAIF), co-founded by OpenAI, Anthropic, and Block<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Three founding projects were contributed:</p>

<ul>
  <li><strong>Model Context Protocol (MCP)</strong> — Anthropic’s universal tool integration standard</li>
  <li><strong>goose</strong> — Block’s open-source local-first AI agent framework</li>
  <li><strong>AGENTS.md</strong> — OpenAI’s specification for repository-level agent instructions</li>
</ul>

<p>By February 2026, AAIF had grown to 146 members including JPMorgan Chase, American Express, Red Hat, Autodesk, Huawei, and UiPath, with David Nalley (AWS) appointed as governing board chair<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<pre><code class="language-mermaid">graph TD
    A[Linux Foundation] --&gt; B[Agentic AI Foundation - AAIF]
    B --&gt; C[MCP&lt;br/&gt;Anthropic]
    B --&gt; D[AGENTS.md&lt;br/&gt;OpenAI]
    B --&gt; E[goose&lt;br/&gt;Block]
    B --&gt; F[146 Members&lt;br/&gt;Feb 2026]
    F --&gt; G[Gold: JPMorgan, Red Hat,&lt;br/&gt;Autodesk, UiPath, ...]
    F --&gt; H[Silver: 79 organisations]
</code></pre>

<h2 id="who-supports-agentsmd-today">Who Supports AGENTS.md Today</h2>

<p>As of April 2026, over 25 tools read AGENTS.md natively<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Tool</th>
      <th style="text-align: center">Native Support</th>
      <th style="text-align: center">Auto-Loads</th>
      <th>Tool-Specific File</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Codex CLI</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>GitHub Copilot</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">Config-dependent</td>
      <td><code class="language-plaintext highlighter-rouge">copilot-instructions.md</code></td>
    </tr>
    <tr>
      <td><strong>Cursor</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td><code class="language-plaintext highlighter-rouge">.cursor/rules/</code></td>
    </tr>
    <tr>
      <td><strong>Gemini CLI / Jules</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td><code class="language-plaintext highlighter-rouge">GEMINI.md</code></td>
    </tr>
    <tr>
      <td><strong>Windsurf</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td><code class="language-plaintext highlighter-rouge">.windsurfrules</code></td>
    </tr>
    <tr>
      <td><strong>Amp</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>Devin</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>Aider</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>OpenCode</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>goose</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>JetBrains Junie</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>Zed</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>Warp</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
    <tr>
      <td><strong>Factory</strong></td>
      <td style="text-align: center">✅</td>
      <td style="text-align: center">✅</td>
      <td>—</td>
    </tr>
  </tbody>
</table>

<p>Claude Code remains the notable exception: it auto-loads <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> but does not natively read <code class="language-plaintext highlighter-rouge">AGENTS.md</code><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. A symlink or include workaround is required (see below).</p>

<h2 id="the-specification">The Specification</h2>

<p>AGENTS.md is deliberately minimal. It is a plain Markdown file with no required YAML front matter, no version field, and no schema<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. This simplicity is the point — it lowers the adoption barrier and ensures every Markdown-capable tool can parse it.</p>

<h3 id="recommended-sections">Recommended Sections</h3>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># Project Overview</span>
Brief description of the codebase and its architecture.

<span class="gu">## Build and Test Commands</span>
<span class="p">-</span> <span class="sb">`npm run build`</span> — production build
<span class="p">-</span> <span class="sb">`npm test`</span> — unit tests via Vitest
<span class="p">-</span> <span class="sb">`npm run lint`</span> — ESLint + Prettier

<span class="gu">## Code Style</span>
<span class="p">-</span> TypeScript strict mode, no <span class="sb">`any`</span>
<span class="p">-</span> Prefer <span class="sb">`interface`</span> over <span class="sb">`type`</span> for object shapes
<span class="p">-</span> Use named exports

<span class="gu">## Security Boundaries</span>
<span class="p">-</span> Never commit <span class="sb">`.env`</span> files
<span class="p">-</span> All SQL must use parameterised queries
<span class="p">-</span> No shell command construction from user input

<span class="gu">## Git Workflow</span>
<span class="p">-</span> Branch naming: <span class="sb">`feat/`</span>, <span class="sb">`fix/`</span>, <span class="sb">`chore/`</span>
<span class="p">-</span> Squash merges to main
<span class="p">-</span> Conventional Commits format
</code></pre></div></div>

<h3 id="hierarchy-rules">Hierarchy Rules</h3>

<p>AGENTS.md supports nested placement in monorepos. The agent reads the file closest to the file being edited, with explicit user prompts overriding everything<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>repo-root/
├── AGENTS.md                  # Global rules
├── packages/
│   ├── api/
│   │   └── AGENTS.md          # API-specific overrides
│   └── web/
│       └── AGENTS.md          # Frontend-specific overrides
</code></pre></div></div>

<p>Codex CLI additionally supports a global <code class="language-plaintext highlighter-rouge">~/.codex/AGENTS.md</code> for personal defaults that apply across all repositories<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h2 id="the-cross-tool-strategy">The Cross-Tool Strategy</h2>

<p>The practical recommendation from the community is the <strong>80/20 rule</strong>: write 80% of your instructions in <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, then maintain tool-specific files only for features that require them<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="what-goes-in-agentsmd">What Goes in AGENTS.md</h3>

<p>Everything that is tool-agnostic:</p>

<ul>
  <li>Build, test, and lint commands</li>
  <li>Code style and conventions</li>
  <li>Architecture overview</li>
  <li>Security boundaries</li>
  <li>Git workflow rules</li>
  <li>Domain vocabulary</li>
</ul>

<h3 id="what-stays-in-tool-specific-files">What Stays in Tool-Specific Files</h3>

<p>Features unique to a specific tool:</p>

<ul>
  <li><strong>CLAUDE.md</strong>: MCP server configuration, Claude-specific slash commands</li>
  <li><strong>.cursor/rules/</strong>: MDC-format files with YAML front matter for glob-based activation scoping<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></li>
  <li><strong>Codex <code class="language-plaintext highlighter-rouge">config.toml</code></strong>: Sandbox modes, approval policies, model selection, hooks — these are runtime configuration, not agent instructions</li>
</ul>

<h3 id="claude-code-workaround">Claude Code Workaround</h3>

<p>Since Claude Code does not yet read <code class="language-plaintext highlighter-rouge">AGENTS.md</code> natively, the cleanest approach is a symlink:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ln</span> <span class="nt">-s</span> AGENTS.md CLAUDE.md
</code></pre></div></div>

<p>Or, if you need Claude-specific additions, create a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> that references the shared file:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># Claude Code Instructions</span>

See AGENTS.md for project conventions. Additional Claude-specific notes below.

<span class="gu">## MCP Servers</span>
<span class="p">-</span> Use the <span class="sb">`filesystem`</span> MCP server for large directory traversals
</code></pre></div></div>

<h2 id="practical-migration-consolidating-your-files">Practical Migration: Consolidating Your Files</h2>

<p>If your repository currently maintains multiple instruction files, here is the migration path:</p>

<pre><code class="language-mermaid">flowchart LR
    A[.cursorrules] --&gt; D[Extract common&lt;br/&gt;instructions]
    B[CLAUDE.md] --&gt; D
    C[copilot-instructions.md] --&gt; D
    D --&gt; E[AGENTS.md&lt;br/&gt;Single source of truth]
    E --&gt; F[Symlink or&lt;br/&gt;thin wrappers]
    F --&gt; G[CLAUDE.md&lt;br/&gt;Claude-specific only]
    F --&gt; H[.cursor/rules/&lt;br/&gt;Glob-scoped only]
</code></pre>

<ol>
  <li><strong>Audit</strong> existing files and highlight overlapping instructions</li>
  <li><strong>Extract</strong> common content into a single <code class="language-plaintext highlighter-rouge">AGENTS.md</code> at the repository root</li>
  <li><strong>Reduce</strong> tool-specific files to only their unique features</li>
  <li><strong>Symlink</strong> where possible — <code class="language-plaintext highlighter-rouge">CLAUDE.md → AGENTS.md</code> works well when content is 90%+ shared</li>
  <li><strong>Test</strong> by running each tool against the same task and comparing output quality</li>
</ol>

<h3 id="size-guidelines">Size Guidelines</h3>

<p>Keep <code class="language-plaintext highlighter-rouge">AGENTS.md</code> under 500 lines<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Beyond that, you hit context window consumption issues — the file is injected into every prompt. For larger projects, use the monorepo hierarchy pattern rather than a single enormous root file. This aligns with the findings in the ETH Zurich study on context pollution, where oversized instruction files degraded agent performance<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>

<h2 id="enterprise-considerations">Enterprise Considerations</h2>

<p>For organisations adopting AGENTS.md as a cross-team standard:</p>

<ul>
  <li><strong>Version control</strong>: <code class="language-plaintext highlighter-rouge">AGENTS.md</code> should be committed and never <code class="language-plaintext highlighter-rouge">.gitignore</code>d — it is a team artefact, not a personal preference file<sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></li>
  <li><strong>CI validation</strong>: Lint your <code class="language-plaintext highlighter-rouge">AGENTS.md</code> in CI to catch formatting issues, overly long files, or missing sections. A simple <code class="language-plaintext highlighter-rouge">wc -l AGENTS.md | awk '{if ($1 &gt; 500) exit 1}'</code> check suffices</li>
  <li><strong>Signed manifests</strong>: GitHub Copilot Enterprise supports GPG-signed context files to prevent tampering<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> — ⚠️ this is an enterprise-only feature and not part of the AGENTS.md specification itself</li>
  <li><strong>Audit logging</strong>: Enterprise tools increasingly log which context files were loaded per session, useful for compliance</li>
</ul>

<h2 id="what-this-means-for-codex-cli-users">What This Means for Codex CLI Users</h2>

<p>If you are already using Codex CLI with <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, you are already on the standard. The key takeaway is that your <code class="language-plaintext highlighter-rouge">AGENTS.md</code> now works across the entire ecosystem — there is no Codex-specific dialect. When a colleague opens the same repository in Cursor, Gemini CLI, or Copilot, they get the same baseline instructions.</p>

<p>The Codex-specific configuration — sandbox modes, approval policies, model selection, hooks, profiles — belongs in <code class="language-plaintext highlighter-rouge">config.toml</code> and <code class="language-plaintext highlighter-rouge">.codex/agents/</code> TOML files, not in <code class="language-plaintext highlighter-rouge">AGENTS.md</code>. This separation is clean: <code class="language-plaintext highlighter-rouge">AGENTS.md</code> is the <em>what</em> (project conventions), <code class="language-plaintext highlighter-rouge">config.toml</code> is the <em>how</em> (runtime behaviour).</p>

<h2 id="looking-ahead">Looking Ahead</h2>

<p>With 146 member organisations and adoption across 60,000+ repositories, AGENTS.md has crossed the threshold from convention to standard. The AAIF governance model — the same structure that governs Kubernetes, Node.js, and Linux itself — provides the stability that enterprise teams require before committing to a specification<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<p>The remaining gap is Claude Code’s lack of native support. Given that Anthropic co-founded AAIF, native <code class="language-plaintext highlighter-rouge">AGENTS.md</code> reading in Claude Code seems likely — but until it ships, the symlink workaround remains necessary.</p>

<p>For senior developers managing multi-tool workflows, the action is straightforward: consolidate into a single <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, keep it under 500 lines, and let the tools converge around you.</p>

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://agents.md/">AGENTS.md — Official specification site</a>, accessed April 2026. Lists 25+ supported tools and 60,000+ project adoption. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://smartscope.blog/en/generative-ai/github-copilot/github-copilot-agents-md-guide/">AGENTS.md Cross-Tool Unified Management Guide — SmartScope</a>, February 2026. Recommends the 80/20 shared-base approach and 500-line maximum. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">Linux Foundation Announces the Formation of the Agentic AI Foundation (AAIF)</a>, 9 December 2025. Co-founded by OpenAI, Anthropic, and Block with MCP, goose, and AGENTS.md as founding projects. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://www.linuxfoundation.org/press/agentic-ai-foundation-welcomes-97-new-members">Agentic AI Foundation Welcomes 97 New Members — Linux Foundation</a>, 24 February 2026. Total membership reached 146; David Nalley appointed governing board chair. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://thepromptshelf.dev/blog/agents-md-vs-claude-md/">AGENTS.md vs CLAUDE.md: A Practical Guide — The Prompt Shelf</a>, 2026. Details cross-tool behaviour differences and Claude Code’s AGENTS.md gap. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/guides/agents-md">Custom instructions with AGENTS.md — Codex CLI official documentation</a>, accessed April 2026. Documents hierarchy: global → repo root → subdirectory. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://danielvaughan.github.io/codex-resources/articles/2026-03-27-agents-md-bloat-problem/">The AGENTS.md Bloat Problem — Codex Resources</a>, 27 March 2026. ETH Zurich study findings on context pollution from oversized instruction files. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://www.msbiro.net/posts/ai-cli-standardization-guidelines/">AI CLI Standardization: From Tool Lock-in to Portability — msbiro.net</a>, 2026. Covers signed manifests, enterprise security features, and audit logging for context files. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[AGENTS.md as an Open Standard: Cross-Tool Portability Under Linux Foundation Governance]]></summary></entry><entry><title type="html">How the Codex CLI Agentic Loop Works in Detail to the Code Level</title><link href="https://codex.danielvaughan.com/2026/04/07/codex-cli-agentic-loop-internals/" rel="alternate" type="text/html" title="How the Codex CLI Agentic Loop Works in Detail to the Code Level" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codex-cli-agentic-loop-internals</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codex-cli-agentic-loop-internals/"><![CDATA[<h1 id="how-the-codex-cli-agentic-loop-works-in-detail-to-the-code-level">How the Codex CLI Agentic Loop Works in Detail to the Code Level</h1>

<hr />

<p>Every time you type a prompt into Codex CLI, a carefully orchestrated machinery of Rust async tasks, streaming API calls, tool dispatchers, and OS-level sandboxes springs into action. This article traces the complete lifecycle of a single turn through the Codex CLI codebase — from keystroke to committed code — referencing the actual crate structure, key source files, and design decisions that make it work.</p>

<h2 id="the-cargo-workspace-at-a-glance">The Cargo Workspace at a Glance</h2>

<p>Codex CLI ships as a single binary compiled from a Cargo workspace of approximately 84 member crates<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The crates that matter most for understanding the agentic loop are:</p>

<table>
  <thead>
    <tr>
      <th>Crate</th>
      <th>Responsibility</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">codex-core</code></td>
      <td>Session management, model API communication, tool orchestration</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">codex-protocol</code></td>
      <td>Shared wire types (<code class="language-plaintext highlighter-rouge">Op</code>, <code class="language-plaintext highlighter-rouge">EventMsg</code>, items)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">codex-tui</code></td>
      <td>Interactive terminal UI (Ratatui-based)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">codex-exec</code></td>
      <td>Headless non-interactive execution (<code class="language-plaintext highlighter-rouge">codex exec</code>)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">codex-cli</code></td>
      <td>Multitool dispatcher routing subcommands</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">codex-config</code></td>
      <td>Layered configuration with validation</td>
    </tr>
  </tbody>
</table>

<p>The binary entry point lives in <code class="language-plaintext highlighter-rouge">codex-cli</code>, which delegates to either <code class="language-plaintext highlighter-rouge">codex-tui</code> (interactive) or <code class="language-plaintext highlighter-rouge">codex-exec</code> (headless) after parsing arguments<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="the-submissionevent-architecture">The Submission/Event Architecture</h2>

<p>Codex decouples its user interface from the agent engine using an <strong>asynchronous submission/event queue pattern</strong><sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Two primitives define the contract:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">Codex::submit(Op)</code></strong> — clients push operations (user turns, approvals, interrupts) wrapped in <code class="language-plaintext highlighter-rouge">Submission</code> envelopes carrying unique IDs and optional W3C trace context for distributed tracing.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">Codex::next_event()</code></strong> — the engine emits <code class="language-plaintext highlighter-rouge">EventMsg</code> notifications (message deltas, tool status updates, approval requests) back to the UI.</li>
</ul>

<p>This separation means the TUI, the exec harness, and the app-server for IDE integration all consume the same event stream. The <code class="language-plaintext highlighter-rouge">submission_loop</code> runs as a dedicated Tokio task, ensuring linearised state changes whilst supporting concurrent event processing across multiple client connections<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant User as User / IDE
    participant Sub as Codex::submit()
    participant Loop as submission_loop (Tokio task)
    participant Ctx as ContextManager
    participant API as Responses API (SSE)
    participant Tools as ToolRouter
    participant Evt as Codex::next_event()

    User-&gt;&gt;Sub: Op::UserTurn(prompt)
    Sub-&gt;&gt;Loop: Submission { id, op, trace_ctx }
    Loop-&gt;&gt;Ctx: Record user input, build prompt
    Ctx-&gt;&gt;API: POST /v1/responses (streaming)
    API--&gt;&gt;Loop: SSE: response.output_text.delta
    Loop--&gt;&gt;Evt: EventMsg::TextDelta
    API--&gt;&gt;Loop: SSE: response.output_item.added (tool_call)
    Loop-&gt;&gt;Tools: Dispatch tool call
    Tools--&gt;&gt;Loop: Tool result
    Loop-&gt;&gt;Ctx: Append result to history
    Ctx-&gt;&gt;API: POST /v1/responses (continuation)
    API--&gt;&gt;Loop: SSE: response.completed
    Loop--&gt;&gt;Evt: EventMsg::TurnComplete
    Evt--&gt;&gt;User: Render final output
</code></pre>

<h2 id="thread-and-turn-semantics">Thread and Turn Semantics</h2>

<p>Codex models conversations as a hierarchy of <strong>Threads</strong> and <strong>Turns</strong><sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<ul>
  <li>A <strong>Thread</strong> is a persistent conversation backed by SQLite (<code class="language-plaintext highlighter-rouge">StateDB</code>). Threads survive process restarts and can be resumed, forked, archived, or rolled back.</li>
  <li>A <strong>Turn</strong> is one round-trip cycle: user input triggers model inference, which may produce tool calls whose results feed back into the model until a final assistant message appears.</li>
  <li><strong>Items</strong> are granular events within a turn — agent messages, shell output, file edits, reasoning traces.</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">ThreadManager</code> orchestrates multiple <code class="language-plaintext highlighter-rouge">CodexThread</code> instances (a primary agent plus any sub-agents), each maintaining its own <code class="language-plaintext highlighter-rouge">ContextManager</code> for message history and token accounting<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="prompt-assembly-and-the-responses-api">Prompt Assembly and the Responses API</h2>

<p>Each turn begins with the <code class="language-plaintext highlighter-rouge">ContextManager</code> assembling a prompt for the OpenAI Responses API. The prompt structure follows a strict ordering to maximise cache hits<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<ol>
  <li><strong>System message</strong> — general rules, coding standards</li>
  <li><strong>Tools</strong> — conforming to the Responses API tool schema</li>
  <li><strong>Developer instructions</strong> — from <code class="language-plaintext highlighter-rouge">config.toml</code>, <code class="language-plaintext highlighter-rouge">AGENTS.md</code>, <code class="language-plaintext highlighter-rouge">AGENTS.override.md</code>, and skill-based instructions (subject to a 32 KiB default limit)<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></li>
  <li><strong>Input sequence</strong> — the full conversation history (text, images, file inputs, tool results)</li>
</ol>

<p>Codex deliberately avoids the <code class="language-plaintext highlighter-rouge">previous_response_id</code> parameter despite the apparent inefficiency of resending the full history each time. This design choice ensures every request is <strong>stateless</strong>, enabling Zero Data Retention (ZDR) compliance for enterprise customers who reject server-side data storage<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<p>The API is called via one of three endpoints depending on authentication<sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Auth Method</th>
      <th>Endpoint</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ChatGPT login</td>
      <td><code class="language-plaintext highlighter-rouge">chatgpt.com/backend-api/codex/responses</code></td>
    </tr>
    <tr>
      <td>API key</td>
      <td><code class="language-plaintext highlighter-rouge">api.openai.com/v1/responses</code></td>
    </tr>
    <tr>
      <td>Local/OSS models</td>
      <td><code class="language-plaintext highlighter-rouge">localhost:11434/v1/responses</code> (with <code class="language-plaintext highlighter-rouge">--oss</code>)</td>
    </tr>
  </tbody>
</table>

<p>Responses stream back as <strong>Server-Sent Events (SSE)</strong>: <code class="language-plaintext highlighter-rouge">response.output_text.delta</code> events drive incremental UI rendering, whilst <code class="language-plaintext highlighter-rouge">response.output_item.added</code> events signal tool call requests requiring dispatch<sup id="fnref:2:4" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h2 id="tool-dispatch-the-toolrouter">Tool Dispatch: The ToolRouter</h2>

<p>When the model emits a tool call, the <code class="language-plaintext highlighter-rouge">ToolRouter</code> (in <code class="language-plaintext highlighter-rouge">codex-core</code>) classifies and dispatches it to one of three execution backends<sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<h3 id="built-in-shell-tools">Built-in Shell Tools</h3>

<p>Shell commands route through the <code class="language-plaintext highlighter-rouge">UnifiedExecProcessManager</code>, which manages PTY allocation and long-running process lifecycle. The system prompt teaches a <strong>shell-first toolkit</strong> — <code class="language-plaintext highlighter-rouge">cat</code> for reading, <code class="language-plaintext highlighter-rouge">grep</code>/<code class="language-plaintext highlighter-rouge">find</code> for searching, test runners and linters for verification — reserving file mutation for the dedicated <code class="language-plaintext highlighter-rouge">apply_patch</code> envelope<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h3 id="the-apply_patch-system">The apply_patch System</h3>

<p>File modifications use a structured patch format rather than raw shell writes. The binary supports a special invocation mode: when <code class="language-plaintext highlighter-rouge">arg1</code> is <code class="language-plaintext highlighter-rouge">--codex-run-as-apply-patch</code>, the process acts as a virtual patch CLI<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. This ensures all file edits pass through a validated, diffable pathway rather than unconstrained shell writes.</p>

<h3 id="mcp-server-integration">MCP Server Integration</h3>

<p>External tools (database queries, API calls, custom integrations) are accessed via the Model Context Protocol. The <code class="language-plaintext highlighter-rouge">McpConnectionManager</code> maintains lifecycle management for MCP servers over stdio or HTTP bridges, routing tool calls through the same approval and sandbox policy as built-in tools<sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<pre><code class="language-mermaid">flowchart TD
    TC[Model emits tool_call] --&gt; TR{ToolRouter}
    TR --&gt;|Shell command| APR[Approval Gate]
    TR --&gt;|File edit| APR
    TR --&gt;|MCP tool| APR
    APR --&gt;|Approved| SB{Sandbox Policy}
    APR --&gt;|Denied| DENY[Return denial to model]
    SB --&gt;|DangerFullAccess| EXEC[Execute unrestricted]
    SB --&gt;|WorkspaceWrite| WS[Execute with write ACL]
    SB --&gt;|ReadOnly| RO[Execute read-only]
    EXEC --&gt; RES[Append result to history]
    WS --&gt; RES
    RO --&gt; RES
    RES --&gt; CTX[ContextManager continuation]
    CTX --&gt; API[Next Responses API call]
</code></pre>

<h2 id="the-approval-gate-state-machine">The Approval Gate State Machine</h2>

<p>Before any tool executes, it passes through an approval gate governed by the <code class="language-plaintext highlighter-rouge">AskForApproval</code> enum<sup id="fnref:1:8" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Mode</th>
      <th>Behaviour</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">UnlessTrusted</code></td>
      <td>Auto-approves safe read-only operations; prompts for writes and network access</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">OnRequest</code></td>
      <td>The model itself decides when to request user consent</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">Never</code></td>
      <td>No prompts — used in non-interactive <code class="language-plaintext highlighter-rouge">codex exec</code> modes</td>
    </tr>
  </tbody>
</table>

<p>These map to the user-facing approval modes<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>:</p>

<ul>
  <li><strong>Auto</strong> (default) — reads and workspace-scoped edits proceed; out-of-scope writes and network access require confirmation.</li>
  <li><strong>Read-only</strong> — consultative mode; all mutations require explicit approval.</li>
  <li><strong>Full Access</strong> — unrestricted; use sparingly with trusted repositories.</li>
</ul>

<p>Approval state persists across session resumption via SQLite <code class="language-plaintext highlighter-rouge">StateDB</code>, so resuming a thread retains the user’s previous policy decisions<sup id="fnref:1:9" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="sandbox-lifecycle-landlock-seatbelt-and-arg0-dispatch">Sandbox Lifecycle: Landlock, Seatbelt, and arg0 Dispatch</h2>

<p>The sandbox is Codex CLI’s most distinctive architectural feature — enforcement happens at the <strong>kernel level</strong>, not the application layer<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h3 id="platform-specific-backends">Platform-Specific Backends</h3>

<table>
  <thead>
    <tr>
      <th>Platform</th>
      <th>Mechanism</th>
      <th>Implementation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Linux</td>
      <td>Landlock LSM (+ optional Bubblewrap pipeline)</td>
      <td><code class="language-plaintext highlighter-rouge">codex-linux-sandbox</code> binary alias</td>
    </tr>
    <tr>
      <td>macOS</td>
      <td>Seatbelt sandbox profiles</td>
      <td>Confined mode via <code class="language-plaintext highlighter-rouge">sandbox-exec</code></td>
    </tr>
    <tr>
      <td>Windows</td>
      <td>Restricted token elevation</td>
      <td>Via WSL2</td>
    </tr>
  </tbody>
</table>

<h3 id="the-arg0-dispatch-pattern">The arg0 Dispatch Pattern</h3>

<p>The entry point wraps the main function in <code class="language-plaintext highlighter-rouge">arg0_dispatch_or_else()</code><sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. This function inspects the binary name at invocation time:</p>

<ul>
  <li>If invoked as <strong><code class="language-plaintext highlighter-rouge">codex-linux-sandbox</code></strong>, it immediately executes a sandboxed command using Landlock restrictions without parsing regular CLI arguments.</li>
  <li>Otherwise, it loads environment variables, patches <code class="language-plaintext highlighter-rouge">PATH</code>, and proceeds to normal CLI logic — but crucially, it passes the sandbox executable path downstream so <code class="language-plaintext highlighter-rouge">codex-core</code> can re-invoke itself with restrictions when executing tool calls.</li>
</ul>

<p>This self-referential dispatch pattern means the sandbox helper is embedded within the same binary rather than requiring a separate sidecar process<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h3 id="sandbox-policies">Sandbox Policies</h3>

<p>Three policy levels control what the sandbox permits<sup id="fnref:1:10" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">DangerFullAccess</code></strong> — unrestricted filesystem and network access.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">WorkspaceWrite</code></strong> — write access limited to the current working directory and explicitly specified roots.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">ReadOnly</code></strong> — filesystem read-only to allowed directory roots.</li>
</ul>

<p>Every tool call flows through a centralised execution system in the <code class="language-plaintext highlighter-rouge">ToolOrchestrator</code> that selects the appropriate sandbox based on the current approval mode and the tool’s risk classification<sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. You can test sandbox behaviour directly using <code class="language-plaintext highlighter-rouge">codex debug seatbelt</code> or <code class="language-plaintext highlighter-rouge">codex debug landlock</code><sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h2 id="context-window-management-and-compaction">Context Window Management and Compaction</h2>

<p>With GPT-5.4’s 1M token context window<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>, Codex can sustain long sessions — but history still grows, and the entire conversation is included in every request<sup id="fnref:2:5" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Two strategies keep this manageable:</p>

<h3 id="prompt-caching">Prompt Caching</h3>

<p>Codex structures prompts so that static content (system instructions, tool definitions) occupies the prefix and variable content (conversation history) appends to the end. With cache hits, sampling cost becomes <strong>linear rather than quadratic</strong><sup id="fnref:2:6" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Empirical measurements show<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Scenario</th>
      <th>Cache Hit Rate</th>
      <th>Median TTFT</th>
      <th>Cost per Request</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Stable prefixes</td>
      <td>85%</td>
      <td>953 ms</td>
      <td>$0.009</td>
    </tr>
    <tr>
      <td>Perturbed prefixes</td>
      <td>0%</td>
      <td>2,727 ms</td>
      <td>$0.033</td>
    </tr>
  </tbody>
</table>

<p>That is a <strong>65% latency reduction and 71% cost reduction</strong> from prefix consistency alone.</p>

<p>Cache misses are triggered by mid-conversation configuration changes: tool availability modifications, model switching, sandbox reconfiguration, approval mode changes, or working directory updates<sup id="fnref:2:7" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="automatic-compaction">Automatic Compaction</h3>

<p>Token tracking lives in <code class="language-plaintext highlighter-rouge">codex-rs/core/src/context_manager/history.rs</code>. The <code class="language-plaintext highlighter-rouge">estimate_response_item_model_visible_bytes()</code> function serialises items and applies byte-to-token heuristics, with <code class="language-plaintext highlighter-rouge">Session::recompute_token_usage()</code> in <code class="language-plaintext highlighter-rouge">codex.rs</code> calling <code class="language-plaintext highlighter-rouge">ContextManager::estimate_token_count()</code> to maintain running totals<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>.</p>

<p>When usage exceeds <code class="language-plaintext highlighter-rouge">model_auto_compact_token_limit</code> (approximately 95% of the effective window — around 180K–244K tokens depending on the model), auto-compaction triggers<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>. The process, implemented in <code class="language-plaintext highlighter-rouge">codex-rs/core/src/compact.rs</code><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>:</p>

<ol>
  <li>The full conversation history is sent to the <code class="language-plaintext highlighter-rouge">/responses/compact</code> endpoint with a dedicated summarisation prompt.</li>
  <li>The server generates a structured summary and returns it <strong>AES-encrypted</strong><sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. The encryption keys remain server-side, preventing clients from inspecting or tampering with summaries.</li>
  <li>Write tools are <strong>blocked before compaction</strong> triggers to prevent mid-refactoring conflicts<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</li>
  <li>The session rebuilds context as: initial prompt + recent user messages (~20K tokens) + the encrypted summary blob.</li>
  <li>On subsequent requests, OpenAI’s servers decrypt the blob and inject it with a handoff prompt before feeding context to the model.</li>
</ol>

<p>The implementation includes retry logic with exponential backoff for failed compactions, and warns that “long conversations and multiple compactions can cause the model to be less accurate”<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. Users can also trigger compaction manually via the <code class="language-plaintext highlighter-rouge">/compact</code> slash command.</p>

<pre><code class="language-mermaid">flowchart LR
    A[Token count exceeds threshold] --&gt; B[Block write tools]
    B --&gt; C[Send history to /responses/compact]
    C --&gt; D[Server generates AES-encrypted summary]
    D --&gt; E[Rebuild context: prefix + recent msgs + blob]
    E --&gt; F[Resume normal operation]
    F --&gt; G[Server decrypts blob on next request]
</code></pre>

<h2 id="the-app-server-json-rpc-for-ide-integration">The App Server: JSON-RPC for IDE Integration</h2>

<p>For IDE integration (VS Code, Cursor, JetBrains), the <code class="language-plaintext highlighter-rouge">codex-api</code> crate exposes a <strong>JSON-RPC 2.0 interface over stdio (JSONL)</strong><sup id="fnref:1:11" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>. The server comprises four main components:</p>

<ol>
  <li><strong>Stdio reader</strong> — parses incoming JSON-RPC calls</li>
  <li><strong>CodexMessageProcessor</strong> — translates between wire protocol and internal types</li>
  <li><strong>Thread manager</strong> — creates, resumes, and forks threads</li>
  <li><strong>Core threads</strong> — the actual <code class="language-plaintext highlighter-rouge">CodexThread</code> instances running the agentic loop</li>
</ol>

<p>The <code class="language-plaintext highlighter-rouge">EventMsg</code> notifications from the core are translated into JSON-RPC notifications, enabling IDEs to render streaming output, display approval prompts, and show tool execution status in real time<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>.</p>

<h2 id="session-persistence-and-rollout-files">Session Persistence and Rollout Files</h2>

<p>Every session is persisted as compressed JSONL (<code class="language-plaintext highlighter-rouge">.jsonl.zst</code>) files in <code class="language-plaintext highlighter-rouge">~/.codex/sessions/</code> organised by date<sup id="fnref:1:12" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The <code class="language-plaintext highlighter-rouge">RolloutRecorder</code> filters events based on persistence mode and writes timestamped files enabling:</p>

<ul>
  <li><strong>Resumption</strong> — replay events to restore conversation state</li>
  <li><strong>Forking</strong> — branch a conversation at any point</li>
  <li><strong>Audit trail</strong> — complete operational history for compliance</li>
</ul>

<p>Each rollout file contains session metadata and serialised event items sufficient for full reconstruction<sup id="fnref:1:13" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="error-recovery">Error Recovery</h2>

<p>When tool execution fails, the error output is appended to the conversation history and fed back to the model as a tool result. The model then reasons about the failure and decides whether to retry with a modified approach, try an alternative strategy, or report the failure to the user. This is not explicit retry logic in the orchestrator — rather, the model’s own reasoning drives recovery, consistent with the ReAct pattern<sup id="fnref:2:8" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<p>Compaction failures are the exception: <code class="language-plaintext highlighter-rouge">compact.rs</code> implements explicit retry with exponential backoff before falling back to continued operation with the uncompacted history<sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>

<h2 id="comparative-architecture-claude-code">Comparative Architecture: Claude Code</h2>

<p>For context, Claude Code takes a fundamentally different approach to several of these concerns<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>:</p>

<ul>
  <li><strong>Sandbox</strong>: Application-layer hooks with 17 lifecycle event types (e.g., <code class="language-plaintext highlighter-rouge">PreToolUse</code> on Bash) rather than kernel-level enforcement.</li>
  <li><strong>Context</strong>: 200K token window (vs. Codex’s 1M) compensated by codebase retrieval and cascading <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> hierarchy.</li>
  <li><strong>Multi-agent</strong>: Interactive subagent spawning via Task tool with real-time synthesis, versus Codex’s fire-and-forget cloud delegation supporting up to 6 concurrent threads.</li>
</ul>

<p>Both approaches are valid — Codex optimises for security-first isolation and large-context reasoning; Claude Code optimises for flexible programmable hooks and retrieval-augmented generation.</p>

<h2 id="key-takeaways">Key Takeaways</h2>

<p>The Codex CLI agentic loop is not a simple prompt-response cycle. It is a production-grade async runtime with kernel-level sandboxing, encrypted context compaction, stateless API design for ZDR compliance, and a self-referential binary that re-invokes itself to enforce sandbox restrictions. Understanding these internals is essential for anyone building custom harnesses, debugging unexpected behaviour, or extending Codex through MCP servers and skills.</p>

<hr />

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://deepwiki.com/openai/codex">Architecture Overview — openai/codex — DeepWiki</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:1:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:1:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a> <a href="#fnref:1:9" class="reversefootnote" role="doc-backlink">&#8617;<sup>10</sup></a> <a href="#fnref:1:10" class="reversefootnote" role="doc-backlink">&#8617;<sup>11</sup></a> <a href="#fnref:1:11" class="reversefootnote" role="doc-backlink">&#8617;<sup>12</sup></a> <a href="#fnref:1:12" class="reversefootnote" role="doc-backlink">&#8617;<sup>13</sup></a> <a href="#fnref:1:13" class="reversefootnote" role="doc-backlink">&#8617;<sup>14</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.zenml.io/llmops-database/building-production-ready-ai-agents-openai-codex-cli-architecture-and-agent-loop-design">Building Production-Ready AI Agents: OpenAI Codex CLI Architecture and Agent Loop Design — ZenML</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:2:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:2:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:2:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:2:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:2:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://blakecrosley.com/guides/codex">Codex CLI: The Definitive Technical Reference — Blake Crosley</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://deepwiki.com/openai/codex/6.3-configuration-management">Sandboxing and Security Policies — openai/codex — DeepWiki</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/agent-approvals-security">Agent approvals &amp; security — Codex — OpenAI Developers</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://pierce.dev/notes/a-deep-dive-on-agent-sandboxes">A deep dive on agent sandboxes — Pierce Freeman</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://blakecrosley.com/blog/codex-vs-claude-code-2026">Codex CLI vs Claude Code in 2026: Architecture Deep Dive — Blake Crosley</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://tonylee.im/en/blog/codex-compaction-encrypted-summary-session-handover/">How Codex Solves the Compaction Problem Differently — Tony Lee</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://gist.github.com/badlogic/cd2ef65b0697c4dbe2d13fbecb0a0a5f">Context Compaction Research: Claude Code, Codex CLI, OpenCode, Amp — GitHub Gist</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://community.openai.com/t/automatically-compacting-context/1376290">Automatically compacting context — OpenAI Developer Community</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p><a href="https://openai.com/index/unlocking-the-codex-harness/">Unlocking the Codex harness: how we built the App Server — OpenAI</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[How the Codex CLI Agentic Loop Works in Detail to the Code Level]]></summary></entry><entry><title type="html">Codex CLI Competitive Position April 2026: The Road to Parity with Claude Code</title><link href="https://codex.danielvaughan.com/2026/04/07/codex-cli-competitive-position-april-2026/" rel="alternate" type="text/html" title="Codex CLI Competitive Position April 2026: The Road to Parity with Claude Code" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codex-cli-competitive-position-april-2026</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codex-cli-competitive-position-april-2026/"><![CDATA[<h1 id="codex-cli-competitive-position-april-2026-the-road-to-parity-with-claude-code">Codex CLI Competitive Position April 2026: The Road to Parity with Claude Code</h1>

<hr />

<p>The AI coding agent market has consolidated rapidly. Three products — Claude Code, GitHub Copilot, and Cursor — now control over 70% of a market worth an estimated $4 billion annually<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Codex CLI, backed by GPT-5.3-Codex and a thriving open-source community, sits firmly in Tier 1 alongside Claude Code. This article examines where Codex CLI stands in April 2026, where it leads, where it trails, and whether the parity trajectory holds.</p>

<h2 id="market-landscape-the-april-2026-tier-list">Market Landscape: The April 2026 Tier List</h2>

<p>TokenCalculator’s April 2026 ranking divides the field into three tiers<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Tier</th>
      <th>Tool</th>
      <th>Positioning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Tier 1 — Leaders</strong></td>
      <td>Claude Code (Anthropic)</td>
      <td>Best agentic reasoning, largest context window</td>
    </tr>
    <tr>
      <td> </td>
      <td>OpenAI Codex (CLI + App)</td>
      <td>Best sandbox, background agents, open-source CLI</td>
    </tr>
    <tr>
      <td><strong>Tier 2 — Strong Contenders</strong></td>
      <td>Cursor 3</td>
      <td>Best interactive IDE experience</td>
    </tr>
    <tr>
      <td> </td>
      <td>GitHub Copilot</td>
      <td>Enterprise distribution, Microsoft integration</td>
    </tr>
    <tr>
      <td><strong>Tier 3 — Falling Behind</strong></td>
      <td>Google Antigravity</td>
      <td>Promising launch, stalled roadmap</td>
    </tr>
    <tr>
      <td> </td>
      <td>Windsurf (Cognition)</td>
      <td>Niche positioning</td>
    </tr>
  </tbody>
</table>

<p>Claude Code dominates developer sentiment with a 46% “most loved” rating versus 19% for Cursor and just 9% for Copilot<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. It has captured 41% market share among professional developers, overtaking Copilot’s 38% in barely eight months since launch<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. In the agentic coding subcategory specifically, 71% of developers who regularly use AI agents use Claude Code<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<p>Codex, meanwhile, has grown to over 2 million weekly active users as of March 2026, with token throughput up fivefold since the GPT-5.3-Codex launch in February<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Enterprise adoption includes Cisco, Nvidia, Ramp, Rakuten, and Harvey<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h2 id="benchmark-comparison-specialisation-not-supremacy">Benchmark Comparison: Specialisation, Not Supremacy</h2>

<p>The benchmarks tell a nuanced story of specialisation rather than outright dominance by either tool<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Benchmark</th>
      <th>GPT-5.3-Codex</th>
      <th>Opus 4.6 (Claude)</th>
      <th>Winner</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>SWE-Bench Pro</td>
      <td>56.8%</td>
      <td>—</td>
      <td>—</td>
    </tr>
    <tr>
      <td>SWE-Bench Verified</td>
      <td>80.0% (GPT-5.2)</td>
      <td>80.8%</td>
      <td>Claude (marginal)</td>
    </tr>
    <tr>
      <td>Terminal-Bench 2.0 (model)</td>
      <td>75.1%</td>
      <td>65.4%</td>
      <td><strong>Codex</strong></td>
    </tr>
    <tr>
      <td>Terminal-Bench 2.0 (framework)</td>
      <td>77.3%</td>
      <td>69.9%</td>
      <td><strong>Codex</strong></td>
    </tr>
    <tr>
      <td>OSWorld-Verified</td>
      <td>64.7%</td>
      <td>72.7%</td>
      <td>Claude</td>
    </tr>
    <tr>
      <td>GDPval-AA (knowledge work)</td>
      <td>—</td>
      <td>+144 Elo</td>
      <td>Claude</td>
    </tr>
  </tbody>
</table>

<p>GPT-5.3-Codex leads decisively on terminal and CLI automation tasks — the bread and butter of Codex CLI’s design philosophy<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup><sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. Opus 4.6 leads on GUI automation, knowledge work, and the headline SWE-Bench Verified metric<sup id="fnref:5:2" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. The gap on SWE-Bench Verified is vanishingly small (0.8 percentage points), but Claude Code’s advantage on complex reasoning tasks remains meaningful.</p>

<p>Direct comparison is complicated by reporting differences: OpenAI publishes SWE-Bench Pro scores whilst Anthropic reports Verified scores, making like-for-like analysis difficult<sup id="fnref:5:3" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>

<h2 id="where-codex-cli-leads">Where Codex CLI Leads</h2>

<h3 id="kernel-level-sandboxing">Kernel-Level Sandboxing</h3>

<p>Codex CLI’s security model is architecturally distinct. On Linux, it uses bubblewrap with seccomp filters and Landlock LSM for filesystem isolation. On macOS, it enforces Seatbelt policies via <code class="language-plaintext highlighter-rouge">sandbox-exec</code><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. Network access is disabled by default, significantly reducing prompt injection and data exfiltration risks<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Full-auto mode with kernel sandbox — no approval gates</span>
codex <span class="nt">--full-auto</span> <span class="s2">"refactor auth module to use JWT"</span>

<span class="c"># The sandbox restricts:</span>
<span class="c"># - Network access (disabled by default)</span>
<span class="c"># - Filesystem access (workspace only)</span>
<span class="c"># - Process spawning (filtered by seccomp)</span>
</code></pre></div></div>

<p>Claude Code, by contrast, relies on application-layer hooks for security<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. For regulated industries and CI/CD pipelines, Codex CLI’s OS-enforced isolation is a genuine differentiator.</p>

<h3 id="token-efficiency">Token Efficiency</h3>

<p>GPT-5.3-Codex uses approximately 4x fewer tokens than Claude Code for equivalent tasks<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. At scale, this translates directly to cost savings. For the 80% of solo developers doing moderate daily work, Codex CLI at $20/month is better value per dollar<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="background-agents-and-cloud-execution">Background Agents and Cloud Execution</h3>

<p>Codex’s background agent model — define a task, hand it off, review the branch later — is a genuine workflow innovation<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. The sandboxed cloud execution environment produces PR-ready output that is polished and production-ready<sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="open-source-community">Open-Source Community</h3>

<p>Codex CLI is Apache 2.0 licensed with 67,000+ GitHub stars and 400+ contributors<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. This has spawned a healthy fork ecosystem, most notably <strong>Every Code</strong> (<code class="language-plaintext highlighter-rouge">just-every/code</code>, 3,700+ stars), which adds multi-model orchestration across OpenAI, Claude, and Gemini providers, browser integration, Auto Drive multi-agent automation, and background auto-review via ghost-commit watchers<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>.</p>

<h2 id="where-claude-code-leads">Where Claude Code Leads</h2>

<h3 id="context-window-and-multi-file-reasoning">Context Window and Multi-File Reasoning</h3>

<p>Opus 4.6 offers a 200K standard context window with a 1M-token beta, compared to GPT-5.3-Codex’s 400K standard<sup id="fnref:5:4" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. ⚠️ Effective context utilisation varies by task, and raw window size is not always the binding constraint. However, for large monorepo refactoring — where changes cascade across frontend, backend, database, and test layers — Claude Code’s ability to hold more context and reason about complex interactions gives it a measurable edge<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>

<h3 id="implicit-convention-understanding">Implicit Convention Understanding</h3>

<p>Claude Code demonstrates stronger understanding of implicit project conventions — coding styles, architectural patterns, and team-specific idioms that are not explicitly documented<sup id="fnref:2:4" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. This “naturalness” in tool usage patterns makes it feel more like a senior pair programmer and less like a script executor.</p>

<h3 id="agent-coordination">Agent Coordination</h3>

<p>Claude Code’s Agent Teams feature enables direct agent-to-agent communication for parallel task execution<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. Codex CLI supports subagents for task parallelisation but lacks equivalent cross-agent coordination<sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. For orchestrating complex, multi-step workflows that require handoffs between specialised agents, Claude Code is ahead.</p>

<h2 id="the-cursor-3-factor">The Cursor 3 Factor</h2>

<p>Cursor 3 launched on 2 April 2026 with a fundamental architectural pivot from IDE-with-AI to agent-first workspace<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>. The new Agents Window provides a centralised command hub for managing multi-step, autonomous tasks. Key capabilities include:</p>

<ul>
  <li>Parallel cloud agents for simultaneous task execution</li>
  <li>Multi-repo support with seamless local/cloud handoff</li>
  <li>Design Mode for visual development workflows</li>
  <li>Integrated browsing, plugin, and PR tooling<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup></li>
</ul>

<pre><code class="language-mermaid">graph LR
    A[Developer] --&gt; B{Primary Workflow}
    B --&gt;|Complex reasoning&lt;br/&gt;Multi-file refactors| C[Claude Code]
    B --&gt;|Autonomous batch work&lt;br/&gt;CI/CD, DevOps| D[Codex CLI]
    B --&gt;|Interactive editing&lt;br/&gt;Visual development| E[Cursor 3]
    C --&gt; F[Production Branch]
    D --&gt; F
    E --&gt; F
</code></pre>

<p>The strategic significance is that Cursor’s pivot validates the agentic model that Claude Code and Codex CLI pioneered. Cursor 3 comes as Claude Code reportedly holds 54% of the agentic coding market<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup>, suggesting Cursor is playing catch-up in this segment whilst leveraging its IDE-native advantage.</p>

<h2 id="the-parity-trajectory">The Parity Trajectory</h2>

<p>TokenCalculator’s analysis suggests Codex could pull even with Claude Code by mid-2026 if current trends continue<sup id="fnref:2:5" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Several factors support this:</p>

<ol>
  <li><strong>Model velocity</strong>: GPT-5.3-Codex is 25% faster than its predecessor with fewer tokens consumed<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. GPT-5.4 has already been announced<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">13</a></sup>, suggesting rapid iteration continues.</li>
  <li><strong>Adoption momentum</strong>: From 1 million downloads to 2 million weekly active users in under two months<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</li>
  <li><strong>Enterprise traction</strong>: Named enterprise deployments at Cisco, Nvidia, and others signal institutional confidence<sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</li>
  <li><strong>Open-source moat</strong>: The fork ecosystem (Every Code, Open Codex, and others) creates a gravitational pull that proprietary tools cannot replicate.</li>
</ol>

<p>Against parity, several structural advantages favour Claude Code:</p>

<ol>
  <li><strong>Reasoning depth</strong>: The GDPval-AA Elo gap (+144) reflects genuine architectural differences in reasoning capability<sup id="fnref:5:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</li>
  <li><strong>Market momentum</strong>: 41% market share and $2.5 billion ARR provide resources for rapid iteration<sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</li>
  <li><strong>Developer love</strong>: A 46% “most loved” rating creates retention that is difficult to overcome<sup id="fnref:3:4" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</li>
</ol>

<pre><code class="language-mermaid">graph TD
    A[Q1 2026: Claude Code leads] --&gt; B[Q2 2026: Projected convergence zone]
    B --&gt; C{Mid-2026 outcome}
    C --&gt;|Codex catches up| D[Parity: specialisation-based market split]
    C --&gt;|Claude maintains gap| E[Duopoly: Claude for quality, Codex for efficiency]
    C --&gt;|Cursor disrupts| F[Three-way race with IDE-native advantage]
</code></pre>

<h2 id="practical-guidance">Practical Guidance</h2>

<p>For teams choosing today, the data supports a multi-tool strategy:</p>

<table>
  <thead>
    <tr>
      <th>Workflow</th>
      <th>Recommended Tool</th>
      <th>Rationale</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Autonomous background tasks</td>
      <td>Codex CLI (<code class="language-plaintext highlighter-rouge">--full-auto</code>)</td>
      <td>Kernel sandbox, token efficiency, PR-ready output</td>
    </tr>
    <tr>
      <td>Complex multi-file refactors</td>
      <td>Claude Code</td>
      <td>Larger context, stronger cross-file reasoning</td>
    </tr>
    <tr>
      <td>Interactive development</td>
      <td>Cursor 3</td>
      <td>IDE-native experience, parallel agents</td>
    </tr>
    <tr>
      <td>CI/CD pipeline integration</td>
      <td>Codex CLI (<code class="language-plaintext highlighter-rouge">codex exec</code>)</td>
      <td>OS-level isolation, deterministic execution</td>
    </tr>
    <tr>
      <td>Enterprise with Microsoft stack</td>
      <td>GitHub Copilot</td>
      <td>Distribution, compliance, SSO integration</td>
    </tr>
  </tbody>
</table>

<p>The “best developers use both” pattern identified by multiple analysts<sup id="fnref:8:3" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> is not a hedge — it reflects genuine specialisation in the tools. Codex CLI’s Unix-philosophy approach (do one thing well, in a sandbox, with maximum efficiency) complements Claude Code’s deep-reasoning, convention-aware approach.</p>

<h2 id="what-to-watch">What to Watch</h2>

<ul>
  <li><strong>GPT-5.4’s coding benchmarks</strong>: Will the next model close the SWE-Bench Verified and OSWorld gaps?</li>
  <li><strong>Codex CLI Agent Teams equivalent</strong>: Cross-agent coordination is the most significant feature gap.</li>
  <li><strong>Every Code’s trajectory</strong>: If the fork ecosystem consolidates around multi-model orchestration, it could reshape the competitive dynamics entirely.</li>
  <li><strong>Google Antigravity</strong>: Three months of silence after a promising January launch. Either a pivot is coming or the product is being deprioritised.</li>
</ul>

<hr />

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://sevenolives.com/blog/ai-coding-agents-4-billion-market-consolidation-2026">The $4 Billion Coding Agent Market Just Consolidated — Seven Olives</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://tokencalculator.com/blog/best-ai-ide-cli-tools-april-2026-claude-code-wins">Best AI IDE &amp; CLI Tools April 2026 — TokenCalculator</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:2:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:2:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://byteiota.com/claude-code-hits-41-share-overtakes-copilots-38/">Claude Code Hits 41% Share, Overtakes Copilot’s 38% — byteiota</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:3:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://fortune.com/2026/03/04/openai-codex-growth-enterprise-ai-agents/">OpenAI sees Codex users spike to 1.6 million — Fortune</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://smartscope.blog/en/generative-ai/chatgpt/codex-vs-claude-code-2026-benchmark/">Codex CLI vs Claude Code 2026: Opus 4.6 vs GPT-5.3-Codex Compared — SmartScope</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:5:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:5:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:5:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:5:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-3-codex/">Introducing GPT-5.3-Codex — OpenAI</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/agent-approvals-security">Agent approvals &amp; security — Codex CLI OpenAI Developers</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://www.nxcode.io/resources/news/claude-code-vs-codex-cli-terminal-coding-comparison-2026">Claude Code vs Codex CLI 2026: Which Terminal AI Coding Agent Wins? — NxCode</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:8:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://github.com/just-every/code">Every Code — GitHub</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://particula.tech/blog/codex-vs-claude-code-cli-agent-comparison">Codex vs Claude Code: Which CLI Agent Wins for Your Workflow — Particula</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p><a href="https://creati.ai/ai-news/2026-04-06/cursor-3-agent-first-interface-claude-code-codex/">Cursor Launches Agent-First Cursor 3 Interface — Creati.ai</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <p><a href="https://www.implicator.ai/cursor-3-shifts-to-agent-orchestration-as-claude-code-claims-54-of-coding-market/">Cursor 3 Shifts to Agent Orchestration Amid Market Pressure — Implicator</a> <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-4/">Introducing GPT-5.4 — OpenAI</a> <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Codex CLI Competitive Position April 2026: The Road to Parity with Claude Code]]></summary></entry><entry><title type="html">Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands</title><link href="https://codex.danielvaughan.com/2026/04/07/codex-cli-diagnostic-toolkit-tracing-sandbox-testing/" rel="alternate" type="text/html" title="Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codex-cli-diagnostic-toolkit-tracing-sandbox-testing</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codex-cli-diagnostic-toolkit-tracing-sandbox-testing/"><![CDATA[<h1 id="codex-cli-diagnostic-toolkit-tracing-sandbox-testing-and-the-built-in-debugging-commands">Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands</h1>

<p>Codex CLI ships with a surprisingly deep set of diagnostic tools that most developers never discover. When an agent session stalls, a sandbox blocks a legitimate command, or a config key silently fails to take effect, knowing how to reach for <code class="language-plaintext highlighter-rouge">RUST_LOG</code>, <code class="language-plaintext highlighter-rouge">codex sandbox</code>, or <code class="language-plaintext highlighter-rouge">/debug-config</code> can save hours of guesswork. This article is a systematic reference to every built-in diagnostic surface in Codex CLI as of v0.118.0.</p>

<h2 id="the-diagnostic-surface-area">The Diagnostic Surface Area</h2>

<p>Codex CLI’s diagnostic capabilities span four layers: runtime tracing via environment variables, interactive slash commands inside the TUI, standalone CLI subcommands for offline testing, and post-session analysis via JSONL rollout files.</p>

<pre><code class="language-mermaid">graph TD
    A[Codex CLI Diagnostics] --&gt; B[Runtime Tracing]
    A --&gt; C[TUI Slash Commands]
    A --&gt; D[Standalone Subcommands]
    A --&gt; E[Post-Session Analysis]

    B --&gt; B1["RUST_LOG env var"]
    B --&gt; B2["LOG_FORMAT=json"]
    B --&gt; B3["OpenTelemetry export"]

    C --&gt; C1["/status"]
    C --&gt; C2["/debug-config"]
    C --&gt; C3["/feedback"]

    D --&gt; D1["codex sandbox"]
    D --&gt; D2["codex execpolicy check"]
    D --&gt; D3["codex debug"]
    D --&gt; D4["codex login status"]

    E --&gt; E1["JSONL rollout files"]
    E --&gt; E2["codex-tui.log"]
</code></pre>

<h2 id="runtime-tracing-with-rust_log">Runtime Tracing with RUST_LOG</h2>

<p>Since Codex CLI is built in Rust atop the standard <code class="language-plaintext highlighter-rouge">tracing</code> crate<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, the <code class="language-plaintext highlighter-rouge">RUST_LOG</code> environment variable controls verbosity at module granularity. The default level for Codex crates is <code class="language-plaintext highlighter-rouge">info</code><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="basic-usage">Basic Usage</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Global debug logging</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>debug codex

<span class="c"># Trace-level logging (extremely verbose)</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>trace codex

<span class="c"># Debug logging in non-interactive mode</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>debug codex <span class="nb">exec</span> <span class="s2">"refactor the auth module"</span>
</code></pre></div></div>

<h3 id="module-targeted-tracing">Module-Targeted Tracing</h3>

<p>The real power lies in per-module targeting. Codex’s Rust workspace exposes several key tracing targets<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Debug the core agent loop while keeping everything else at info</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>info,codex_core<span class="o">=</span>debug codex

<span class="c"># Trace shell command execution specifically</span>
<span class="nv">RUST_LOG</span><span class="o">=</span><span class="nv">codex_exec</span><span class="o">=</span>trace,codex_core<span class="o">=</span>debug codex

<span class="c"># Debug sandbox behaviour</span>
<span class="nv">RUST_LOG</span><span class="o">=</span><span class="nv">codex_sandbox</span><span class="o">=</span>debug,codex_process_hardening<span class="o">=</span>debug codex

<span class="c"># Trace API request/response details</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>codex_core::api<span class="o">=</span>trace codex

<span class="c"># Debug MCP server connections</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>codex_core::mcp<span class="o">=</span>debug codex

<span class="c"># Trace configuration resolution</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>codex_core::config<span class="o">=</span>trace codex

<span class="c"># Trace authentication flows</span>
<span class="nv">RUST_LOG</span><span class="o">=</span>codex_core::auth<span class="o">=</span>trace codex
</code></pre></div></div>

<h3 id="structured-log-output">Structured Log Output</h3>

<p>For machine-parseable logs — useful when piping into log aggregation — set the format to JSON<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">RUST_LOG</span><span class="o">=</span>debug <span class="nv">LOG_FORMAT</span><span class="o">=</span>json codex <span class="nb">exec</span> <span class="s2">"run tests"</span> 2&gt;&amp;1 | <span class="nb">tee </span>codex-debug.log
</code></pre></div></div>

<p>The compact format is also available via <code class="language-plaintext highlighter-rouge">RUST_LOG_FORMAT=compact</code><sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="log-file-locations">Log File Locations</h3>

<p>Codex writes TUI logs to <code class="language-plaintext highlighter-rouge">~/.codex/log/codex-tui.log</code>, with automatic rotation<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. In <code class="language-plaintext highlighter-rouge">codex exec</code> mode, timestamped log files appear at <code class="language-plaintext highlighter-rouge">~/.codex/logs/codex-tui-&lt;timestamp&gt;.log</code><sup id="fnref:2:4" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. These can be safely deleted when no longer needed, but they are invaluable for post-mortem debugging.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Monitor logs in real time during a session</span>
<span class="nb">tail</span> <span class="nt">-f</span> ~/.codex/logs/codex-tui-<span class="k">*</span>.log
</code></pre></div></div>

<p>⚠️ <strong>Performance warning</strong>: Debug and trace levels can reduce throughput by 10–50%<sup id="fnref:2:5" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Reserve them for active troubleshooting, not production workflows.</p>

<h2 id="tui-slash-commands-for-live-diagnostics">TUI Slash Commands for Live Diagnostics</h2>

<p>Three slash commands provide in-session diagnostic information without leaving the TUI.</p>

<h3 id="status--session-overview">/status — Session Overview</h3>

<p>The <code class="language-plaintext highlighter-rouge">/status</code> command displays the current session configuration and token usage<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. This is your first stop when something feels off — it confirms which model is active, the current reasoning effort level, token consumption, and the effective sandbox mode.</p>

<h3 id="debug-config--configuration-layer-diagnostics">/debug-config — Configuration Layer Diagnostics</h3>

<p>When a config key appears to have no effect, <code class="language-plaintext highlighter-rouge">/debug-config</code> reveals the full configuration resolution stack<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. It prints:</p>

<ul>
  <li>Layer order (lowest to highest precedence)</li>
  <li>The effective value of each key and which layer set it</li>
  <li>Policy details: <code class="language-plaintext highlighter-rouge">allowed_approval_policies</code>, <code class="language-plaintext highlighter-rouge">allowed_sandbox_modes</code>, <code class="language-plaintext highlighter-rouge">mcp_servers</code>, <code class="language-plaintext highlighter-rouge">rules</code>, <code class="language-plaintext highlighter-rouge">enforce_residency</code>, and <code class="language-plaintext highlighter-rouge">experimental_network</code></li>
</ul>

<p>This is particularly useful in enterprise environments where <code class="language-plaintext highlighter-rouge">requirements.toml</code> may silently override your <code class="language-plaintext highlighter-rouge">config.toml</code> settings<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. If your <code class="language-plaintext highlighter-rouge">sandbox_mode = "danger-full-access"</code> is being ignored, <code class="language-plaintext highlighter-rouge">/debug-config</code> will show you that a managed policy is enforcing <code class="language-plaintext highlighter-rouge">workspace-write</code>.</p>

<h3 id="feedback--structured-bug-reports">/feedback — Structured Bug Reports</h3>

<p>The <code class="language-plaintext highlighter-rouge">/feedback</code> command collects diagnostic information and submits it directly to OpenAI’s maintainers<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. When invoked, it captures:</p>

<ul>
  <li>Request ID (essential for OpenAI support tickets)</li>
  <li>Session ID</li>
  <li>Connection status (connected/reconnecting/disconnected)</li>
  <li>Last error message</li>
  <li>Active tools count</li>
  <li>MCP server connection status</li>
</ul>

<p>Always run <code class="language-plaintext highlighter-rouge">/feedback</code> before closing a session that exhibited unexpected behaviour — the request ID is the single most useful datum when filing issues on GitHub<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h2 id="the-codex-sandbox-subcommand">The codex sandbox Subcommand</h2>

<p>The <code class="language-plaintext highlighter-rouge">codex sandbox</code> subcommand<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> lets you test arbitrary commands under the exact same sandbox enforcement that Codex applies during agent sessions — without starting an agent session. This is indispensable when diagnosing why a build tool or test runner fails under sandboxing.</p>

<h3 id="platform-specific-syntax">Platform-Specific Syntax</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># macOS — test a command under Seatbelt enforcement</span>
codex sandbox macos <span class="nt">--</span> npm run build

<span class="c"># macOS — with full-auto permissions and denial logging</span>
codex sandbox macos <span class="nt">--full-auto</span> <span class="nt">--log-denials</span> <span class="nt">--</span> cargo <span class="nb">test</span>

<span class="c"># Linux — test under Landlock/bubblewrap enforcement</span>
codex sandbox linux <span class="nt">--</span> pytest tests/

<span class="c"># Linux — full-auto mode (workspace-write equivalent)</span>
codex sandbox linux <span class="nt">--full-auto</span> <span class="nt">--</span> make <span class="nb">install</span>

<span class="c"># Windows — test under restricted token enforcement</span>
codex sandbox windows <span class="nt">--full-auto</span> <span class="nt">--</span> dotnet <span class="nb">test</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">--log-denials</code> flag on macOS is particularly valuable: it prints every Seatbelt denial to stderr, showing exactly which filesystem path or network operation was blocked<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h3 id="legacy-aliases">Legacy Aliases</h3>

<p>The older <code class="language-plaintext highlighter-rouge">codex debug seatbelt</code> and <code class="language-plaintext highlighter-rouge">codex debug landlock</code> commands still work as aliases<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># These are equivalent:</span>
codex sandbox macos <span class="nt">--</span> <span class="nb">ls</span> /etc
codex debug seatbelt <span class="nt">--</span> <span class="nb">ls</span> /etc
</code></pre></div></div>

<h3 id="practical-use-diagnosing-build-failures">Practical Use: Diagnosing Build Failures</h3>

<p>A common scenario: your Rust project builds fine outside Codex but fails under the agent’s sandbox. Use <code class="language-plaintext highlighter-rouge">codex sandbox</code> to isolate the issue:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Step 1: Test the build under sandbox</span>
codex sandbox linux <span class="nt">--</span> cargo build 2&gt;&amp;1 | <span class="nb">grep</span> <span class="nt">-i</span> denied

<span class="c"># Step 2: If failures appear, try with full-auto (workspace-write)</span>
codex sandbox linux <span class="nt">--full-auto</span> <span class="nt">--</span> cargo build

<span class="c"># Step 3: If it still fails, the issue is network access</span>
<span class="c"># (e.g., crates.io downloads blocked by sandbox)</span>
</code></pre></div></div>

<p>This workflow avoids the cost of starting a full agent session just to debug sandbox restrictions.</p>

<h3 id="platform-implementation-details">Platform Implementation Details</h3>

<p>On macOS 12+, <code class="language-plaintext highlighter-rouge">codex sandbox</code> invokes Apple’s Seatbelt framework via <code class="language-plaintext highlighter-rouge">/usr/bin/sandbox-exec</code> with a runtime-generated profile controlling filesystem and network access<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. On Linux, the sandbox uses a dual-mode pipeline: Landlock LSM by default, or bubblewrap (vendored in <code class="language-plaintext highlighter-rouge">codex-rs/vendor/bubblewrap/</code>) when enabled via <code class="language-plaintext highlighter-rouge">features.use_linux_sandbox_bwrap = true</code><sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. The bubblewrap path provides stronger isolation through PID namespace separation (<code class="language-plaintext highlighter-rouge">--unshare-pid</code>), network namespace isolation (<code class="language-plaintext highlighter-rouge">--unshare-net</code>), and seccomp filters<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<pre><code class="language-mermaid">flowchart LR
    subgraph macOS
        A[codex sandbox macos] --&gt; B[sandbox-exec]
        B --&gt; C[Seatbelt profile]
        C --&gt; D[Command runs isolated]
    end

    subgraph Linux
        E[codex sandbox linux] --&gt; F{bwrap enabled?}
        F --&gt;|Yes| G[bubblewrap]
        F --&gt;|No| H[Landlock + seccomp]
        G --&gt; I[Namespace isolation]
        H --&gt; I
        I --&gt; J[Command runs isolated]
    end
</code></pre>

<h2 id="the-codex-execpolicy-check-subcommand">The codex execpolicy check Subcommand</h2>

<p>Before deploying Starlark <code class="language-plaintext highlighter-rouge">.rules</code> files, validate them offline with <code class="language-plaintext highlighter-rouge">codex execpolicy check</code><sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. This subcommand evaluates one or more rule files against a proposed command and reports the decision without executing anything.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Test a command against your rules</span>
codex execpolicy check <span class="se">\</span>
  <span class="nt">--pretty</span> <span class="se">\</span>
  <span class="nt">--rules</span> ~/.codex/rules/default.rules <span class="se">\</span>
  <span class="nt">--</span> gh <span class="nb">pr </span>view 7888 <span class="nt">--json</span> title,body,comments
</code></pre></div></div>

<p>The output shows:</p>

<ul>
  <li><strong>Effective decision</strong>: the strictest severity across all matched rules (<code class="language-plaintext highlighter-rouge">forbidden</code> &gt; <code class="language-plaintext highlighter-rouge">prompt</code> &gt; <code class="language-plaintext highlighter-rouge">allow</code>)<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup></li>
  <li><strong>matchedRules</strong>: every rule whose prefix matched, with the exact <code class="language-plaintext highlighter-rouge">matchedPrefix</code> shown<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup></li>
</ul>

<p>You can combine multiple rule files:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex execpolicy check <span class="se">\</span>
  <span class="nt">--pretty</span> <span class="se">\</span>
  <span class="nt">--rules</span> ~/.codex/rules/default.rules <span class="se">\</span>
  <span class="nt">--rules</span> .codex/rules/project.rules <span class="se">\</span>
  <span class="nt">--</span> <span class="nb">rm</span> <span class="nt">-rf</span> node_modules
</code></pre></div></div>

<h3 id="unit-tests-in-rules-files">Unit Tests in Rules Files</h3>

<p>The <code class="language-plaintext highlighter-rouge">match</code> and <code class="language-plaintext highlighter-rouge">not_match</code> fields in <code class="language-plaintext highlighter-rouge">prefix_rule()</code> function as inline unit tests<sup id="fnref:8:3" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. Codex validates these examples when it loads your rules — if a <code class="language-plaintext highlighter-rouge">match</code> example does not trigger the rule, or a <code class="language-plaintext highlighter-rouge">not_match</code> example does, loading fails. Always populate these fields:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">prefix_rule(</span>
    <span class="py">pattern</span> <span class="p">=</span> <span class="s">"rm -rf"</span><span class="p">,</span>
    <span class="py">decision</span> <span class="p">=</span> <span class="s">"forbidden"</span><span class="p">,</span>
    <span class="py">match</span> <span class="p">=</span> <span class="p">[</span><span class="s">"rm -rf /"</span><span class="p">,</span> <span class="s">"rm -rf node_modules"</span><span class="p">],</span>
    <span class="py">not_match</span> <span class="p">=</span> <span class="p">[</span><span class="s">"rm file.txt"</span><span class="p">,</span> <span class="s">"rmdir empty"</span><span class="p">]</span>
<span class="err">)</span>
</code></pre></div></div>

<h2 id="the-codex-debug-subcommand">The codex debug Subcommand</h2>

<p>The <code class="language-plaintext highlighter-rouge">codex debug</code> command is the entry point for lower-level debugging utilities<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># List available debug subcommands</span>
codex debug <span class="nt">--help</span>

<span class="c"># Test the V2 app-server protocol with a single message</span>
codex debug app-server send-message-v2 <span class="s2">"Hello, world"</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">send-message-v2</code> subcommand initialises the app-server, starts a thread, sends a single user message, and streams all server notifications back to the terminal<sup id="fnref:7:2" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. This is useful for verifying that the app-server protocol is functioning correctly without starting the full TUI.</p>

<h2 id="authentication-diagnostics">Authentication Diagnostics</h2>

<p>When sessions fail to start with authentication errors, two commands help isolate the issue:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Check current auth state without triggering a login flow</span>
codex login status

<span class="c"># Inspect the auth token file directly</span>
<span class="nb">cat</span> ~/.codex/auth.json | jq <span class="s1">'.expires_at'</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">codex login status</code> command reports whether you are authenticated, the method used (browser OAuth, device code, or API key), and whether the token is valid<sup id="fnref:7:3" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. A common failure pattern is a corrupted or expired <code class="language-plaintext highlighter-rouge">auth.json</code> file — the fix is to run <code class="language-plaintext highlighter-rouge">codex logout</code> followed by <code class="language-plaintext highlighter-rouge">codex login</code><sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h2 id="opentelemetry-integration">OpenTelemetry Integration</h2>

<p>For production observability beyond ad-hoc tracing, Codex CLI supports OpenTelemetry export via the <code class="language-plaintext highlighter-rouge">[otel]</code> config section<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[otel]</span>
<span class="py">enabled</span> <span class="p">=</span> <span class="kc">true</span>
<span class="py">endpoint</span> <span class="p">=</span> <span class="s">"http://localhost:4317"</span>
<span class="py">sampling_ratio</span> <span class="p">=</span> <span class="mf">1.0</span>
<span class="py">service_name</span> <span class="p">=</span> <span class="s">"codex-cli"</span>
</code></pre></div></div>

<p>This exports spans covering API calls, tool invocations, and sandbox operations to any OTLP-compatible backend (Jaeger, Grafana Tempo, SigNoz)<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>. Environment variables <code class="language-plaintext highlighter-rouge">OTEL_EXPORTER_OTLP_ENDPOINT</code> and <code class="language-plaintext highlighter-rouge">OTEL_SERVICE_NAME</code> also work<sup id="fnref:2:6" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<p>⚠️ Note: <code class="language-plaintext highlighter-rouge">codex exec</code> does not yet export OTel metrics, and <code class="language-plaintext highlighter-rouge">codex mcp-server</code> mode has no telemetry support as of v0.118.0<sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>.</p>

<h2 id="post-session-analysis-with-jsonl-rollout-files">Post-Session Analysis with JSONL Rollout Files</h2>

<p>Every Codex session writes a JSONL rollout file to <code class="language-plaintext highlighter-rouge">~/.codex/sessions/</code><sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. These files contain <code class="language-plaintext highlighter-rouge">RolloutItem</code> events (SessionMeta, UserMessage, ResponseItem, EventMsg, ApprovalDecision) and are invaluable for understanding what happened during a session that went wrong.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Find the latest session rollout</span>
<span class="nb">ls</span> <span class="nt">-t</span> ~/.codex/sessions/<span class="k">*</span>.jsonl | <span class="nb">head</span> <span class="nt">-1</span>

<span class="c"># Count tool calls in a session</span>
<span class="nb">cat</span> ~/.codex/sessions/&lt;session&gt;.jsonl | <span class="se">\</span>
  jq <span class="s1">'select(.type == "ResponseItem") | .item.type'</span> | <span class="se">\</span>
  <span class="nb">sort</span> | <span class="nb">uniq</span> <span class="nt">-c</span> | <span class="nb">sort</span> <span class="nt">-rn</span>

<span class="c"># Extract all approval decisions</span>
<span class="nb">cat</span> ~/.codex/sessions/&lt;session&gt;.jsonl | <span class="se">\</span>
  jq <span class="s1">'select(.type == "ApprovalDecision")'</span>
</code></pre></div></div>

<p>The community <code class="language-plaintext highlighter-rouge">codex-replay</code> tool renders these JSONL files as browsable HTML, and the <code class="language-plaintext highlighter-rouge">ccusage</code> project provides daily and monthly cost reports parsed from rollout token counters<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>

<h2 id="a-diagnostic-workflow-checklist">A Diagnostic Workflow Checklist</h2>

<p>When something goes wrong, work through this sequence:</p>

<ol>
  <li><strong>Check config</strong>: Run <code class="language-plaintext highlighter-rouge">/debug-config</code> to verify your settings are taking effect</li>
  <li><strong>Check auth</strong>: Run <code class="language-plaintext highlighter-rouge">codex login status</code> to rule out credential issues</li>
  <li><strong>Check sandbox</strong>: Use <code class="language-plaintext highlighter-rouge">codex sandbox &lt;platform&gt; -- &lt;command&gt;</code> to test commands in isolation</li>
  <li><strong>Check rules</strong>: Use <code class="language-plaintext highlighter-rouge">codex execpolicy check --pretty --rules &lt;file&gt; -- &lt;command&gt;</code> to validate execution policies</li>
  <li><strong>Enable tracing</strong>: Restart with <code class="language-plaintext highlighter-rouge">RUST_LOG=debug codex</code> and monitor <code class="language-plaintext highlighter-rouge">~/.codex/log/codex-tui.log</code></li>
  <li><strong>Review the rollout</strong>: Inspect the JSONL session file for the failed session</li>
  <li><strong>File a report</strong>: Run <code class="language-plaintext highlighter-rouge">/feedback</code> to capture diagnostic context before closing</li>
</ol>

<p>This top-down approach moves from cheap (no restart required) to expensive (restart with tracing), minimising disruption to your workflow.</p>

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://github.com/openai/codex/blob/main/codex-rs/README.md">codex-rs README — OpenAI Codex GitHub repository</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.mintlify.com/openai/codex/advanced/tracing">Tracing &amp; Verbose Logging — Codex CLI Advanced Documentation</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:2:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:2:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:2:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://smartscope.blog/en/generative-ai/chatgpt/codex-cli-diagnostic-logs-deep-dive/">Codex CLI Logs: Location, Debug Flags &amp; 401 Error Fix — SmartScope</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/cli/slash-commands">Slash Commands in Codex CLI — OpenAI Developers</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/config-reference">Configuration Reference — Codex OpenAI Developers</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://www.mintlify.com/openai/codex/architecture/sandboxing">Sandboxing Architecture — Codex CLI Documentation</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/cli/reference">Command Line Options — Codex CLI Reference — OpenAI Developers</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:7:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:7:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://github.com/openai/codex/blob/main/codex-rs/execpolicy/README.md">Execution Policy (execpolicy) README — OpenAI Codex GitHub</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:8:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://danielvaughan.github.io/codex-resources/articles/2026-03-28-codex-cli-opentelemetry-observability/">Codex CLI OpenTelemetry: Observability and Metrics in Production — Codex Resources</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://danielvaughan.github.io/codex-resources/articles/2026-03-30-codex-cli-session-analytics-jsonl-rollout/">Codex CLI Session Analytics: Mining the JSONL Rollout Format — Codex Resources</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands]]></summary></entry><entry><title type="html">How to Be a Codex CLI Forward Deployed Engineer</title><link href="https://codex.danielvaughan.com/2026/04/07/codex-cli-forward-deployed-engineer/" rel="alternate" type="text/html" title="How to Be a Codex CLI Forward Deployed Engineer" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codex-cli-forward-deployed-engineer</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codex-cli-forward-deployed-engineer/"><![CDATA[<h1 id="how-to-be-a-codex-cli-forward-deployed-engineer">How to Be a Codex CLI Forward Deployed Engineer</h1>

<p>The forward deployed engineer (FDE) has become the most sought-after role in AI-native companies. Job postings for the position grew 800–1,000% through 2025<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, and in 2026, organisations like OpenAI, Anthropic, and Palantir continue to hire aggressively<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. For engineers who have mastered Codex CLI, the FDE path offers a natural career escalation — combining deep tool expertise with client-facing delivery in high-stakes enterprise environments.</p>

<p>This article examines what it means to specialise as a Codex CLI FDE: the workflows, the technical stack, and the career mechanics.</p>

<h2 id="what-a-forward-deployed-engineer-actually-does">What a Forward Deployed Engineer Actually Does</h2>

<p>An FDE embeds directly with enterprise customers to ship custom, production-grade solutions. OpenAI’s own FDE job descriptions specify that candidates will “lead complex end-to-end deployments of frontier models in production alongside OpenAI’s most strategic customers”<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Unlike a traditional software engineer building product features behind a PM layer, an FDE owns the full lifecycle:</p>

<pre><code class="language-mermaid">flowchart LR
    A[Discovery &amp; Scoping] --&gt; B[Rapid Prototyping]
    B --&gt; C[Production Hardening]
    C --&gt; D[Deployment &amp; Rollout]
    D --&gt; E[Feedback to Product]
    E --&gt;|Next engagement| A
</code></pre>

<p>The critical distinction is <strong>ownership scope</strong>. A solutions architect designs and demos pre-sale. A core engineer ships features for all users. An FDE builds and deploys the final solution for a specific customer, post-sale, and remains accountable for its production stability<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="the-codex-cli-fde-technical-stack">The Codex CLI FDE Technical Stack</h2>

<p>An FDE specialising in Codex CLI needs to operate across three layers: the CLI itself, the harness/integration layer, and the enterprise infrastructure layer.</p>

<h3 id="layer-1-codex-cli-mastery">Layer 1: Codex CLI Mastery</h3>

<p>At minimum, an FDE must be fluent in the full configuration surface. Codex CLI’s <code class="language-plaintext highlighter-rouge">config.toml</code> hierarchy — global (<code class="language-plaintext highlighter-rouge">~/.codex/config.toml</code>), project-scoped (<code class="language-plaintext highlighter-rouge">.codex/config.toml</code>), and enterprise-managed (<code class="language-plaintext highlighter-rouge">requirements.toml</code>) — is the foundation of every deployment<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<p>Key configuration areas an FDE works with daily:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Custom model provider for client's LLM proxy</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.1"</span>
<span class="py">model_provider</span> <span class="p">=</span> <span class="s">"client-proxy"</span>

<span class="nn">[model_providers.client-proxy]</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"Client LLM Gateway"</span>
<span class="py">base_url</span> <span class="p">=</span> <span class="s">"https://llm-proxy.client.internal"</span>
<span class="py">env_key</span> <span class="p">=</span> <span class="s">"CLIENT_API_KEY"</span>
<span class="py">wire_api</span> <span class="p">=</span> <span class="s">"responses"</span>

<span class="c"># Enterprise sandbox policy</span>
<span class="py">sandbox_mode</span> <span class="p">=</span> <span class="s">"workspace-write"</span>

<span class="nn">[sandbox_workspace_write]</span>
<span class="py">writable_roots</span> <span class="p">=</span> <span class="nn">["/home/developer/project"]</span>
<span class="py">network_access</span> <span class="p">=</span> <span class="kc">true</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">approval_policy</code> system — with granular controls for <code class="language-plaintext highlighter-rouge">sandbox_approval</code>, <code class="language-plaintext highlighter-rouge">mcp_elicitations</code>, <code class="language-plaintext highlighter-rouge">skill_approval</code>, and <code class="language-plaintext highlighter-rouge">request_permissions</code> — lets FDEs tune the autonomy level to match each client’s security posture<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Enterprise clients behind TLS-inspecting proxies require custom CA certificate configuration via <code class="language-plaintext highlighter-rouge">SSL_CERT_FILE</code> and related environment variables<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>

<h3 id="layer-2-custom-harness-construction">Layer 2: Custom Harness Construction</h3>

<p>The Codex app server exposes a JSON-RPC protocol that lets external tools drive the same agent loop used by the CLI and VS Code extension<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. For FDEs, this is the integration point where Codex becomes part of a client’s existing toolchain.</p>

<p>Common harness patterns an FDE builds:</p>

<pre><code class="language-mermaid">flowchart TB
    subgraph Client Infrastructure
        CI[CI/CD Pipeline]
        IDE[IDE Extension]
        WEB[Internal Web App]
    end
    subgraph Codex Harness
        AS[App Server - JSON-RPC]
        AL[Agent Loop]
        SB[Sandbox]
    end
    subgraph Models
        API[Responses API]
        PROXY[Client LLM Proxy]
    end
    CI --&gt; AS
    IDE --&gt; AS
    WEB --&gt; AS
    AS --&gt; AL
    AL --&gt; SB
    AL --&gt; API
    AL --&gt; PROXY
</code></pre>

<p>The Python SDK enables programmatic access for embedding Codex into automation workflows, CI pipelines, and custom tooling<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. An FDE building a client integration typically wires the app server into the client’s deployment pipeline, configures model routing through their LLM proxy, and sets up the hooks system for audit logging.</p>

<h3 id="layer-3-enterprise-infrastructure">Layer 3: Enterprise Infrastructure</h3>

<p>This is where 80% of the actual FDE work happens<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Getting a demo working is straightforward; navigating corporate SSO, network policies, compliance requirements, and production credentials is the real challenge.</p>

<p>Enterprise deployment concerns an FDE handles:</p>

<table>
  <thead>
    <tr>
      <th>Concern</th>
      <th>Codex CLI Mechanism</th>
      <th>FDE Responsibility</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Authentication</td>
      <td>ChatGPT device-code sign-in or API key auth<sup id="fnref:5:2" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></td>
      <td>Integrate with client IdP, configure <code class="language-plaintext highlighter-rouge">forced_login_method</code> and <code class="language-plaintext highlighter-rouge">forced_chatgpt_workspace_id</code><sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
    </tr>
    <tr>
      <td>Network security</td>
      <td>Configurable domain allowlists/denylists, SOCKS5 proxy support<sup id="fnref:4:3" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
      <td>Map client firewall rules to <code class="language-plaintext highlighter-rouge">allowed_domains</code>, configure egress policies</td>
    </tr>
    <tr>
      <td>Audit &amp; compliance</td>
      <td>Hooks system, OpenTelemetry export<sup id="fnref:4:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
      <td>Wire into client SIEM, configure <code class="language-plaintext highlighter-rouge">otel</code> exporters with TLS certs</td>
    </tr>
    <tr>
      <td>Cost management</td>
      <td>Pay-as-you-go Codex seats for Business/Enterprise<sup id="fnref:5:3" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></td>
      <td>Model token budgets, <code class="language-plaintext highlighter-rouge">model_reasoning_effort</code> tuning</td>
    </tr>
    <tr>
      <td>Device management</td>
      <td><code class="language-plaintext highlighter-rouge">requirements.toml</code> for managed machines<sup id="fnref:4:5" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
      <td>Work with client MDM to distribute configuration profiles</td>
    </tr>
  </tbody>
</table>

<p>The <code class="language-plaintext highlighter-rouge">requirements.toml</code> mechanism is particularly important — it lets an organisation enforce constraints such as disallowing <code class="language-plaintext highlighter-rouge">approval_policy = "never"</code> or <code class="language-plaintext highlighter-rouge">sandbox_mode = "danger-full-access"</code>, ensuring that individual developers cannot bypass security policies<sup id="fnref:4:6" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h2 id="a-day-in-the-life">A Day in the Life</h2>

<p>A typical FDE engagement follows a compressed timeline. Where a traditional project might take quarters, an FDE ships in weeks.</p>

<p><strong>Week 1 — Discovery</strong>: Embed with the client engineering team. Map their existing development workflow. Identify where Codex CLI slots in — code generation, test authoring, migration automation, documentation. Set up a proof-of-concept with their model provider and network configuration.</p>

<p><strong>Week 2 — Prototype</strong>: Build an AGENTS.md constitution tailored to their codebase conventions. Configure domain-expert agents in <code class="language-plaintext highlighter-rouge">.codex/agents/</code> for their specific stack. Wire the app server into their CI pipeline for automated code review or test generation. Demo to stakeholders.</p>

<p><strong>Week 3–4 — Production hardening</strong>: Lock down sandbox policies via <code class="language-plaintext highlighter-rouge">requirements.toml</code>. Configure OpenTelemetry export to their observability stack. Set up the hooks system for compliance audit trails. Load-test the app server under realistic concurrency. Train their team on prompt engineering patterns.</p>

<p><strong>Ongoing — Feedback loop</strong>: Channel field insights back to the core product team. Identify feature gaps that affect multiple enterprise clients. Propose configuration additions or SDK improvements.</p>

<h2 id="skills-beyond-the-terminal">Skills Beyond the Terminal</h2>

<p>OpenAI’s FDE postings require 7+ years of full-stack engineering experience, with customer-facing experience “highly desirable”<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. The role demands travel — up to 50% for the NYC position<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Compensation at OpenAI and Anthropic ranges from $350K to $550K total compensation at mid-to-senior levels<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<p>The skills profile is T-shaped<sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<ul>
  <li><strong>Vertical depth</strong>: Codex CLI internals, Responses API, model behaviour, sandbox architecture, TOML configuration surface</li>
  <li><strong>Horizontal breadth</strong>: Customer empathy, problem decomposition in ambiguous environments, rapid prototyping under pressure, product sense for identifying patterns across clients</li>
</ul>

<p>Technical interviewing for FDE roles typically includes a decomposition case study — receiving an ambiguous real-world problem and structuring a solution iteratively, not just solving a LeetCode problem<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Palantir pioneered this format, and it has become industry-standard for FDE hiring<sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="from-codex-user-to-fde-the-career-path">From Codex User to FDE: The Career Path</h2>

<p>The strongest FDE candidates come from backgrounds that combine building and deploying<sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<ol>
  <li><strong>Early-stage startup engineers</strong> — accustomed to wearing multiple hats and shipping under pressure</li>
  <li><strong>Solutions architects who build PoCs</strong> — already comfortable in client-facing technical contexts</li>
  <li><strong>Platform/DevOps engineers</strong> — experienced with the infrastructure layer that consumes most FDE time</li>
  <li><strong>Power users of AI coding tools</strong> — deep familiarity with Codex CLI, Claude Code, or similar agentic tools</li>
</ol>

<p>The progression typically runs: power user → internal champion (rolling out Codex CLI within your own organisation) → FDE candidate. Building a portfolio of custom harness integrations, AGENTS.md configurations, and enterprise deployment case studies is the most direct path<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>

<h2 id="the-integration-wall">The Integration Wall</h2>

<p>The FDE role exists because of what the industry calls the “integration wall”<sup id="fnref:1:8" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> — the gap between a powerful platform and an enterprise-ready deployment. Codex CLI is a sophisticated tool with a deep configuration surface, but every enterprise has unique network policies, compliance requirements, model provider preferences, and development workflows.</p>

<p>No amount of documentation closes that gap entirely. Someone has to sit with the client, understand their constraints, and build the bridge. That someone is the FDE.</p>

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://hashnode.com/blog/a-complete-2026-guide-to-the-forward-deployed-engineer">Tech’s secret weapon: The complete 2026 guide to the forward deployed engineer</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:1:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:1:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.ai-daily.news/articles/forward-deployed-engineers-ais-key-role-in-2026">Forward-Deployed Engineers: AI’s Key Role in 2026 — AI Daily</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://openai.com/careers/forward-deployed-engineer-(fde)-nyc-new-york-city/">Forward Deployed Engineer (FDE) - NYC — OpenAI Careers</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/config-reference">Configuration Reference — Codex CLI, OpenAI Developers</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:4:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:4:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:4:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:4:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://www.augmentcode.com/learn/openai-codex-cli-enterprise">OpenAI Codex CLI ships v0.116.0 with enterprise features — Augment Code</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:5:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:5:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://openai.com/index/unlocking-the-codex-harness/">Unlocking the Codex harness: how we built the App Server — OpenAI</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://www.rocketlane.com/blogs/forward-deployed-engineer">Forward Deployed Engineer (FDE): The Essential 2026 Guide — Rocketlane</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[How to Be a Codex CLI Forward Deployed Engineer]]></summary></entry><entry><title type="html">Codex CLI on GitLab: Duo Agent Platform, CI/CD Pipelines, and MCP Integration</title><link href="https://codex.danielvaughan.com/2026/04/07/codex-cli-gitlab-integration-duo-agent-platform/" rel="alternate" type="text/html" title="Codex CLI on GitLab: Duo Agent Platform, CI/CD Pipelines, and MCP Integration" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codex-cli-gitlab-integration-duo-agent-platform</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codex-cli-gitlab-integration-duo-agent-platform/"><![CDATA[<h1 id="codex-cli-on-gitlab-duo-agent-platform-cicd-pipelines-and-mcp-integration">Codex CLI on GitLab: Duo Agent Platform, CI/CD Pipelines, and MCP Integration</h1>

<hr />

<p>While Codex CLI’s GitHub integration has received extensive coverage — from <code class="language-plaintext highlighter-rouge">openai/codex-action</code> to issue assignment via Copilot — GitLab teams have been building their own integration story. That story now has three distinct layers: the <strong>Duo Agent Platform</strong> for mention-driven automation, <strong>CI/CD pipeline jobs</strong> using <code class="language-plaintext highlighter-rouge">codex exec</code> for structured analysis, and <strong>MCP server connections</strong> for real-time repository access. This article covers all three, with production-ready configuration for each.</p>

<h2 id="the-three-integration-layers">The Three Integration Layers</h2>

<p>Before diving into configuration, it helps to understand where each layer fits in a GitLab workflow.</p>

<pre><code class="language-mermaid">graph TD
    A["Developer Action"] --&gt; B{"Integration Layer"}
    B --&gt;|"@codex mention in MR/issue"| C["Duo Agent Platform&lt;br/&gt;External Agent"]
    B --&gt;|"Pipeline trigger on MR"| D["CI/CD Job&lt;br/&gt;codex exec --full-auto"]
    B --&gt;|"Local development"| E["MCP Server&lt;br/&gt;GitLab API access"]

    C --&gt; F["Codex reads repo context&lt;br/&gt;+ CODEX.md rules"]
    D --&gt; G["Structured JSON/Markdown&lt;br/&gt;output as artifacts"]
    E --&gt; H["Issue/MR/branch tools&lt;br/&gt;in Codex session"]

    F --&gt; I["Inline comment or&lt;br/&gt;draft MR created"]
    G --&gt; J["CodeClimate report in&lt;br/&gt;MR widget"]
    H --&gt; K["Agent-driven GitLab&lt;br/&gt;operations"]
</code></pre>

<p>Each layer serves a different need: Duo for ad-hoc delegation, CI/CD for systematic analysis on every merge request, and MCP for interactive agent sessions that need GitLab API access.</p>

<h2 id="layer-1-duo-agent-platform--external-agents">Layer 1: Duo Agent Platform — External Agents</h2>

<p>GitLab’s Duo Agent Platform reached general availability on 15 January 2026<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, bringing first-class support for external AI agents — including Codex CLI — directly into the GitLab workflow. Premium and Ultimate customers on GitLab 18.8+ (both SaaS and self-managed) can enable the Codex agent through the AI Catalog<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<h3 id="how-it-works">How It Works</h3>

<p>When a developer mentions <code class="language-plaintext highlighter-rouge">@codex</code> (or the configured service account) in an issue comment or merge request discussion, GitLab triggers the external agent<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. The agent:</p>

<ol>
  <li>Reads the repository tree and surrounding context</li>
  <li>Loads project-specific rules from <code class="language-plaintext highlighter-rouge">CODEX.md</code> at the repository root</li>
  <li>Decides whether code changes, review feedback, or clarification is needed</li>
  <li>Responds inline with either a ready-to-merge change or a comment</li>
</ol>

<p>The trigger mechanisms are<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Trigger</th>
      <th>Where</th>
      <th>What Happens</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Mention</strong></td>
      <td>Issue or MR comment</td>
      <td>Agent analyses context and responds</td>
    </tr>
    <tr>
      <td><strong>Assignment</strong></td>
      <td>Issue or MR assignee</td>
      <td>Agent works the issue autonomously</td>
    </tr>
    <tr>
      <td><strong>Reviewer assignment</strong></td>
      <td>MR reviewer</td>
      <td>Agent performs code review</td>
    </tr>
  </tbody>
</table>

<h3 id="configuration">Configuration</h3>

<p>The Codex agent uses GitLab-managed credentials through the AI Gateway, so there is no separate <code class="language-plaintext highlighter-rouge">OPENAI_API_KEY</code> to configure<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Administrators add the agent via <strong>Settings → AI Catalog → GitLab-managed external agents → Add to AI Catalog</strong><sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<p>For self-managed instances, the external agent configuration requires the gateway token injection<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># External agent configuration (admin-level)</span>
<span class="na">injectGatewayToken</span><span class="pi">:</span> <span class="no">true</span>
</code></pre></div></div>

<p>This automatically provides <code class="language-plaintext highlighter-rouge">AI_FLOW_AI_GATEWAY_TOKEN</code> and <code class="language-plaintext highlighter-rouge">AI_FLOW_AI_GATEWAY_HEADERS</code> environment variables to the agent runtime<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h3 id="codexmd-the-project-rules-file">CODEX.md: The Project Rules File</h3>

<p>All project-specific rules — style, testing, security policies — come from <code class="language-plaintext highlighter-rouge">CODEX.md</code> at the repository root<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. This is distinct from <code class="language-plaintext highlighter-rouge">AGENTS.md</code> used by the CLI directly; GitLab’s integration reads <code class="language-plaintext highlighter-rouge">CODEX.md</code> specifically. A minimal example:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># Project Rules</span>

<span class="gu">## Code Style</span>
<span class="p">-</span> Use TypeScript strict mode
<span class="p">-</span> All functions must have JSDoc comments
<span class="p">-</span> Prefer <span class="sb">`const`</span> over <span class="sb">`let`</span>

<span class="gu">## Testing</span>
<span class="p">-</span> Every new function needs a unit test
<span class="p">-</span> Run <span class="sb">`npm test`</span> before proposing changes
<span class="p">-</span> Minimum 80% branch coverage

<span class="gu">## Security</span>
<span class="p">-</span> Never commit secrets or API keys
<span class="p">-</span> Use parameterised queries for all database access
<span class="p">-</span> Validate all user input at the controller boundary
</code></pre></div></div>

<h3 id="current-limitations">Current Limitations</h3>

<p>The Duo Agent Platform integration is still maturing. As of April 2026, the <code class="language-plaintext highlighter-rouge">@codex</code> mention workflow runs Codex in the background and responds asynchronously — there is no interactive steering<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. The agent creates merge requests linked back to the originating issue but cannot yet trigger downstream pipelines automatically<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. ⚠️ The exact latency and token limits for Duo-triggered Codex sessions are not publicly documented.</p>

<h2 id="layer-2-cicd-pipeline-integration-with-codex-exec">Layer 2: CI/CD Pipeline Integration with codex exec</h2>

<p>For systematic, repeatable analysis on every merge request, embedding <code class="language-plaintext highlighter-rouge">codex exec</code> directly into <code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code> is the more mature approach. The official OpenAI Cookbook published a comprehensive guide to this pattern in March 2026<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h3 id="code-quality-reports">Code Quality Reports</h3>

<p>The core pattern runs <code class="language-plaintext highlighter-rouge">codex exec --full-auto</code> with a structured prompt that generates GitLab-compliant CodeClimate JSON. The output appears directly in the merge request widget alongside native GitLab code quality results.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">stages</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s">codex</span>

<span class="na">default</span><span class="pi">:</span>
  <span class="na">image</span><span class="pi">:</span> <span class="s">node:24</span>

<span class="na">codex_review</span><span class="pi">:</span>
  <span class="na">stage</span><span class="pi">:</span> <span class="s">codex</span>
  <span class="na">variables</span><span class="pi">:</span>
    <span class="na">CODEX_QA_PATH</span><span class="pi">:</span> <span class="s2">"</span><span class="s">gl-code-quality-report.json"</span>
    <span class="na">CODEX_RAW_LOG</span><span class="pi">:</span> <span class="s2">"</span><span class="s">artifacts/codex-raw.log"</span>
  <span class="na">rules</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">if</span><span class="pi">:</span> <span class="s1">'</span><span class="s">$CI_PIPELINE_SOURCE</span><span class="nv"> </span><span class="s">==</span><span class="nv"> </span><span class="s">"merge_request_event"'</span>
      <span class="na">when</span><span class="pi">:</span> <span class="s">on_success</span>
    <span class="pi">-</span> <span class="na">when</span><span class="pi">:</span> <span class="s">never</span>
  <span class="na">script</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">npm -g i @openai/codex@latest</span>
    <span class="pi">-</span> <span class="s">FILE_LIST="$(git ls-files | sed 's/^/- /')"</span>
    <span class="pi">-</span> <span class="pi">|</span>
      <span class="s">codex exec --full-auto "Review this repository and output a GitLab Code Quality report in CodeClimate JSON format.</span>
      <span class="s">OUTPUT MUST BE A SINGLE JSON ARRAY between markers:</span>
      <span class="s">=== BEGIN_CODE_QUALITY_JSON ===</span>
      <span class="s">&lt;JSON ARRAY&gt;</span>
      <span class="s">=== END_CODE_QUALITY_JSON ===</span>
      <span class="s">Each issue: description, check_name, fingerprint, severity, location with path and lines.begin.</span>
      <span class="s">Only report issues in: ${FILE_LIST}" \</span>
        <span class="s">| tee "${CODEX_RAW_LOG}" &gt;/dev/null</span>
    <span class="pi">-</span> <span class="pi">|</span>
      <span class="s">sed -E 's/\x1B\[[0-9;]*[A-Za-z]//g' "${CODEX_RAW_LOG}" \</span>
        <span class="s">| awk '/BEGIN_CODE_QUALITY_JSON/{grab=1;next}/END_CODE_QUALITY_JSON/{grab=0}grab' \</span>
        <span class="s">&gt; "${CODEX_QA_PATH}"</span>
    <span class="pi">-</span> <span class="s1">'</span><span class="s">node</span><span class="nv"> </span><span class="s">-e</span><span class="nv"> </span><span class="s">"JSON.parse(require(\"fs\").readFileSync(\"${CODEX_QA_PATH}\",\"utf8\"))"</span><span class="nv"> </span><span class="s">||</span><span class="nv"> </span><span class="s">echo</span><span class="nv"> </span><span class="s">"[]"</span><span class="nv"> </span><span class="s">&gt;</span><span class="nv"> </span><span class="s">"${CODEX_QA_PATH}"'</span>
  <span class="na">artifacts</span><span class="pi">:</span>
    <span class="na">reports</span><span class="pi">:</span>
      <span class="na">codequality</span><span class="pi">:</span> <span class="s">gl-code-quality-report.json</span>
    <span class="na">paths</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">artifacts/</span>
    <span class="na">expire_in</span><span class="pi">:</span> <span class="s">14 days</span>
</code></pre></div></div>

<p>The marker-based extraction pattern (<code class="language-plaintext highlighter-rouge">=== BEGIN_... ===</code> / <code class="language-plaintext highlighter-rouge">=== END_... ===</code>) is critical for reliability<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. LLM output is inherently variable; the markers give the pipeline a deterministic extraction boundary. The ANSI escape stripping (<code class="language-plaintext highlighter-rouge">sed -E 's/\x1B\[[0-9;]*[A-Za-z]//g'</code>) handles terminal colour codes that <code class="language-plaintext highlighter-rouge">codex exec</code> may emit<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h3 id="security-remediation-pipeline">Security Remediation Pipeline</h3>

<p>The cookbook’s second pattern is more ambitious: a two-stage pipeline where Codex first triages SAST findings, then generates validated patches for high/critical vulnerabilities<sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<pre><code class="language-mermaid">graph LR
    A["GitLab SAST Scanner"] --&gt;|"gl-sast-report.json"| B["codex_recommendations&lt;br/&gt;Stage 1: Triage"]
    B --&gt;|"security_priority.md"| C["Human Review"]
    A --&gt;|"gl-sast-report.json"| D["codex_resolution&lt;br/&gt;Stage 2: Patch Gen"]
    D --&gt;|"codex_patches/*.patch"| E["git apply --check&lt;br/&gt;Validation"]
    E --&gt;|"Valid patches"| F["Merge Request&lt;br/&gt;with fixes"]
</code></pre>

<p>The remediation stage iterates over each high/critical vulnerability, constructs a per-finding prompt, and validates the generated diff with <code class="language-plaintext highlighter-rouge">git apply --check</code> before storing it as a <code class="language-plaintext highlighter-rouge">.patch</code> artefact<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. Invalid patches are discarded automatically — only clean-applying fixes survive.</p>

<p>Key design decisions in this pattern:</p>

<ul>
  <li><strong>Severity whitelisting</strong>: Only <code class="language-plaintext highlighter-rouge">high</code> and <code class="language-plaintext highlighter-rouge">critical</code> findings trigger remediation, avoiding wasted tokens on informational findings<sup id="fnref:6:5" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></li>
  <li><strong>Per-vulnerability isolation</strong>: Each finding gets its own <code class="language-plaintext highlighter-rouge">codex exec</code> invocation, preventing cross-contamination between fixes<sup id="fnref:6:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></li>
  <li><strong>Unified diff validation</strong>: <code class="language-plaintext highlighter-rouge">git apply --check</code> runs before any patch is stored, ensuring no broken diffs reach reviewers<sup id="fnref:6:7" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></li>
</ul>

<h3 id="authentication-in-cicd">Authentication in CI/CD</h3>

<p>Authentication uses masked CI/CD variables. Store <code class="language-plaintext highlighter-rouge">OPENAI_API_KEY</code> as a protected, masked variable in <strong>Settings → CI/CD → Variables</strong><sup id="fnref:6:8" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. For self-managed instances using Azure OpenAI instead, configure the <code class="language-plaintext highlighter-rouge">CODEX_MODEL</code> and endpoint variables accordingly.</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">variables</span><span class="pi">:</span>
  <span class="na">OPENAI_API_KEY</span><span class="pi">:</span> <span class="s">$OPENAI_API_KEY</span>
  <span class="na">CODEX_MODEL</span><span class="pi">:</span> <span class="s2">"</span><span class="s">gpt-5.4"</span>  <span class="c1"># or your preferred model</span>
</code></pre></div></div>

<h3 id="cost-control">Cost Control</h3>

<p>Each <code class="language-plaintext highlighter-rouge">codex exec</code> invocation in CI/CD consumes API tokens. For cost management:</p>

<ul>
  <li>Use <code class="language-plaintext highlighter-rouge">gpt-5.4-mini</code> for triage/quality jobs and reserve <code class="language-plaintext highlighter-rouge">gpt-5.4</code> for remediation<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup></li>
  <li>Set <code class="language-plaintext highlighter-rouge">--model</code> explicitly in the <code class="language-plaintext highlighter-rouge">codex exec</code> command to avoid inheriting a more expensive default</li>
  <li>Monitor token usage via the <code class="language-plaintext highlighter-rouge">postTaskComplete</code> hook pattern or OpenTelemetry<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup></li>
</ul>

<h2 id="layer-3-gitlab-mcp-server-integration">Layer 3: GitLab MCP Server Integration</h2>

<p>For interactive Codex sessions that need to read issues, manage merge requests, or create branches on GitLab, the MCP integration provides structured API access.</p>

<h3 id="gitlabs-native-mcp-server">GitLab’s Native MCP Server</h3>

<p>GitLab ships its own MCP server<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup> that exposes repository, issue, merge request, and pipeline tools. Configure it in your Codex <code class="language-plaintext highlighter-rouge">config.toml</code>:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[mcp_servers.gitlab]</span>
<span class="py">command</span> <span class="p">=</span> <span class="s">"codex"</span>
<span class="py">args</span> <span class="p">=</span> <span class="p">[</span><span class="s">"mcp"</span><span class="p">,</span> <span class="s">"add"</span><span class="p">,</span> <span class="s">"--url"</span><span class="p">,</span> <span class="s">"https://gitlab.example.com/api/v4/mcp"</span><span class="p">]</span>
</code></pre></div></div>

<p>Or add it directly via the CLI<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex mcp add <span class="nt">--url</span> <span class="s2">"https://gitlab.example.com/api/v4/mcp"</span>
</code></pre></div></div>

<h3 id="composios-gitlab-mcp">Composio’s GitLab MCP</h3>

<p>For teams wanting a managed MCP endpoint that bundles GitLab alongside other services, Composio provides a Tool Router that dynamically loads GitLab tools based on the task<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[mcp_servers.composio]</span>
<span class="py">url</span> <span class="p">=</span> <span class="s">"https://connect.composio.dev/mcp"</span>
<span class="nn">http_headers</span> <span class="o">=</span> <span class="p">{</span> <span class="py">"x-api-key"</span> <span class="p">=</span> <span class="s">"${COMPOSIO_API_KEY}"</span> <span class="p">}</span>
</code></pre></div></div>

<p>This gives Codex access to GitLab operations — creating projects, managing issues, handling branches, and triggering pipelines — through a single MCP endpoint that also supports other integrations<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>

<h3 id="practical-mcp-use-case-issue-triage">Practical MCP Use Case: Issue Triage</h3>

<p>With the GitLab MCP server configured, you can run an issue triage workflow locally:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex <span class="nb">exec</span> <span class="nt">--full-auto</span> <span class="se">\</span>
  <span class="s2">"Read the open issues labelled 'needs-triage' in this project. </span><span class="se">\</span><span class="s2">
   For each, add a priority label (P1/P2/P3) based on severity </span><span class="se">\</span><span class="s2">
   and add a comment summarising the issue and suggested next steps."</span>
</code></pre></div></div>

<p>The MCP server handles the GitLab API calls — listing issues, adding labels, posting comments — while Codex handles the reasoning and decision-making.</p>

<h2 id="choosing-the-right-layer">Choosing the Right Layer</h2>

<table>
  <thead>
    <tr>
      <th>Criterion</th>
      <th>Duo Agent Platform</th>
      <th>CI/CD Pipeline</th>
      <th>MCP Server</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Trigger</strong></td>
      <td><code class="language-plaintext highlighter-rouge">@codex</code> mention</td>
      <td>MR/pipeline event</td>
      <td>Manual/scripted</td>
    </tr>
    <tr>
      <td><strong>Output</strong></td>
      <td>Inline comments, draft MRs</td>
      <td>Artefacts, reports</td>
      <td>GitLab API operations</td>
    </tr>
    <tr>
      <td><strong>Authentication</strong></td>
      <td>GitLab-managed</td>
      <td>API key variable</td>
      <td>API key + token</td>
    </tr>
    <tr>
      <td><strong>Cost visibility</strong></td>
      <td>Bundled in GitLab Credits<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></td>
      <td>Direct API billing</td>
      <td>Direct API billing</td>
    </tr>
    <tr>
      <td><strong>Best for</strong></td>
      <td>Ad-hoc delegation</td>
      <td>Systematic analysis</td>
      <td>Interactive workflows</td>
    </tr>
    <tr>
      <td><strong>Maturity</strong></td>
      <td>GA (Jan 2026)</td>
      <td>Production-ready</td>
      <td>Stable</td>
    </tr>
  </tbody>
</table>

<p>For most teams, the recommended approach is: <strong>Duo for ad-hoc requests</strong>, <strong>CI/CD for every-MR analysis</strong>, and <strong>MCP for local development workflows</strong> that need GitLab context.</p>

<h2 id="enterprise-considerations">Enterprise Considerations</h2>

<h3 id="self-managed-deployment">Self-Managed Deployment</h3>

<p>Self-managed GitLab instances (18.8+) can enable external agents through the AI Catalog<sup id="fnref:2:3" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. The key requirement is network connectivity to the AI Gateway — or, for air-gapped environments, routing through Azure OpenAI endpoints configured as custom model providers in the Codex <code class="language-plaintext highlighter-rouge">config.toml</code><sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>.</p>

<h3 id="audit-trail">Audit Trail</h3>

<p>All three integration layers produce audit evidence:</p>

<ul>
  <li><strong>Duo</strong>: GitLab tracks agent interactions as system events</li>
  <li><strong>CI/CD</strong>: <code class="language-plaintext highlighter-rouge">codex exec</code> produces JSONL rollout files stored as pipeline artefacts</li>
  <li><strong>MCP</strong>: Standard MCP request/response logging via <code class="language-plaintext highlighter-rouge">RUST_LOG=codex_core::mcp=debug</code></li>
</ul>

<h3 id="gitlab-vs-github-integration-comparison">GitLab vs GitHub: Integration Comparison</h3>

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>GitHub (codex-action)</th>
      <th>GitLab (CI/CD + Duo)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Native agent</strong></td>
      <td>Copilot issue assignment</td>
      <td>Duo Agent Platform</td>
    </tr>
    <tr>
      <td><strong>CI/CD</strong></td>
      <td><code class="language-plaintext highlighter-rouge">openai/codex-action</code></td>
      <td><code class="language-plaintext highlighter-rouge">codex exec</code> in <code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code></td>
    </tr>
    <tr>
      <td><strong>Code quality</strong></td>
      <td>PR checks</td>
      <td>CodeClimate artefact in MR widget</td>
    </tr>
    <tr>
      <td><strong>Security</strong></td>
      <td>Dependabot + Codex Security</td>
      <td>SAST + Codex remediation pipeline</td>
    </tr>
    <tr>
      <td><strong>MCP</strong></td>
      <td>GitHub MCP server</td>
      <td>GitLab MCP server</td>
    </tr>
  </tbody>
</table>

<p>The GitLab integration requires more manual configuration than GitHub’s first-party action, but offers equivalent capabilities once set up. The CI/CD pipeline approach is particularly powerful because GitLab’s artefact system natively understands CodeClimate JSON, making Codex quality findings appear in the same MR widget as native GitLab scanners<sup id="fnref:6:9" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>GitLab Inc., “GitLab Announces the General Availability of GitLab Duo Agent Platform,” 15 January 2026. <a href="https://ir.gitlab.com/news/news-details/2026/GitLab-Announces-the-General-Availability-of-GitLab-Duo-Agent-Platform/default.aspx">https://ir.gitlab.com/news/news-details/2026/GitLab-Announces-the-General-Availability-of-GitLab-Duo-Agent-Platform/default.aspx</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>GitLab Docs, “External agents.” <a href="https://docs.gitlab.com/user/duo_agent_platform/agents/external/">https://docs.gitlab.com/user/duo_agent_platform/agents/external/</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:2:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>GitLab Docs, “External agent configuration examples.” <a href="https://docs.gitlab.com/user/duo_agent_platform/agents/external_examples/">https://docs.gitlab.com/user/duo_agent_platform/agents/external_examples/</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>GitLab Docs, “AI Catalog.” <a href="https://docs.gitlab.com/user/duo_agent_platform/ai_catalog/">https://docs.gitlab.com/user/duo_agent_platform/ai_catalog/</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>GitLab.org, “Product Requirements — Claude Code and OpenAI Codex CLI Integration for GitLab CI/CD (#557820).” <a href="https://gitlab.com/gitlab-org/gitlab/-/issues/557820">https://gitlab.com/gitlab-org/gitlab/-/issues/557820</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>OpenAI Cookbook, “Automating Code Quality and Security Fixes with Codex CLI on GitLab.” <a href="https://developers.openai.com/cookbook/examples/codex/secure_quality_gitlab">https://developers.openai.com/cookbook/examples/codex/secure_quality_gitlab</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:6:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:6:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:6:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:6:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a> <a href="#fnref:6:9" class="reversefootnote" role="doc-backlink">&#8617;<sup>10</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>OpenAI Developers, “Models.” <a href="https://developers.openai.com/api/docs/models">https://developers.openai.com/api/docs/models</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>OpenAI Developers, “Codex CLI Features.” <a href="https://developers.openai.com/codex/cli/features">https://developers.openai.com/codex/cli/features</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>GitLab Docs, “GitLab MCP server.” <a href="https://docs.gitlab.com/user/gitlab_duo/model_context_protocol/mcp_server/">https://docs.gitlab.com/user/gitlab_duo/model_context_protocol/mcp_server/</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>Composio, “How to integrate Gitlab MCP with Codex.” <a href="https://composio.dev/toolkits/gitlab/framework/codex">https://composio.dev/toolkits/gitlab/framework/codex</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>OpenAI Developers, “Codex Configuration Reference.” <a href="https://developers.openai.com/codex/config-reference">https://developers.openai.com/codex/config-reference</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Codex CLI on GitLab: Duo Agent Platform, CI/CD Pipelines, and MCP Integration]]></summary></entry><entry><title type="html">Codex CLI Model Lifecycle: Navigating Deprecations, Migrations, and the GPT-5.x Transition</title><link href="https://codex.danielvaughan.com/2026/04/07/codex-cli-model-lifecycle-deprecations-migrations/" rel="alternate" type="text/html" title="Codex CLI Model Lifecycle: Navigating Deprecations, Migrations, and the GPT-5.x Transition" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codex-cli-model-lifecycle-deprecations-migrations</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codex-cli-model-lifecycle-deprecations-migrations/"><![CDATA[<h1 id="codex-cli-model-lifecycle-navigating-deprecations-migrations-and-the-gpt-5x-transition">Codex CLI Model Lifecycle: Navigating Deprecations, Migrations, and the GPT-5.x Transition</h1>

<hr />

<p>OpenAI’s model release cadence has accelerated dramatically. In the eight months since the original GPT-5-Codex launched in September 2025, we have seen five major Codex-optimised model generations — and three deprecation waves.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> If you maintain Codex CLI configurations across teams, CI pipelines, or custom harnesses, the churn is real. This article maps the full model timeline, explains the deprecation mechanics, and provides a practical migration playbook for the April 2026 landscape.</p>

<h2 id="the-codex-model-timeline">The Codex Model Timeline</h2>

<p>The following timeline captures every Codex-optimised model release and its current status.</p>

<pre><code class="language-mermaid">gantt
    title Codex Model Lifecycle (Sep 2025 – Jun 2026)
    dateFormat YYYY-MM-DD
    axisFormat %b %Y

    section Flagship
    GPT-5-Codex         :done,    gpt5c,   2025-09-23, 2026-04-01
    GPT-5.1-Codex       :done,    gpt51c,  2025-11-19, 2026-04-01
    GPT-5.2-Codex       :active,  gpt52c,  2025-12-18, 2026-06-05
    GPT-5.3-Codex       :active,  gpt53c,  2026-02-05, 2026-10-01
    GPT-5.4 (unified)   :active,  gpt54,   2026-03-05, 2026-10-01

    section Specialist
    GPT-5.1-Codex-Max   :done,    gpt51m,  2025-11-19, 2026-04-01
    GPT-5.3-Codex-Spark :active,  spark,   2026-02-12, 2026-10-01

    section Mini / Nano
    GPT-5-Codex-Mini     :done,   gpt5cm,  2025-09-23, 2026-04-01
    GPT-5.1-Codex-Mini   :done,   gpt51cm, 2025-11-19, 2026-04-01
    GPT-5.4-mini         :active, gpt54m,  2026-03-17, 2026-10-01
    GPT-5.4-nano         :active, gpt54n,  2026-03-17, 2026-10-01
</code></pre>

<h3 id="key-dates">Key dates</h3>

<table>
  <thead>
    <tr>
      <th>Model</th>
      <th>Released</th>
      <th>Deprecated</th>
      <th>Replacement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>GPT-5-Codex</td>
      <td>23 Sep 2025<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></td>
      <td>1 Apr 2026<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></td>
      <td>gpt-5.3-codex</td>
    </tr>
    <tr>
      <td>GPT-5.1-Codex</td>
      <td>19 Nov 2025<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
      <td>1 Apr 2026<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></td>
      <td>gpt-5.3-codex</td>
    </tr>
    <tr>
      <td>GPT-5.1-Codex-Max</td>
      <td>19 Nov 2025<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
      <td>1 Apr 2026<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></td>
      <td>gpt-5.3-codex</td>
    </tr>
    <tr>
      <td>GPT-5.1-Codex-Mini</td>
      <td>19 Nov 2025<sup id="fnref:4:2" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></td>
      <td>1 Apr 2026<sup id="fnref:3:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></td>
      <td>gpt-5.4-mini</td>
    </tr>
    <tr>
      <td>GPT-5-Codex-Mini</td>
      <td>23 Sep 2025<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></td>
      <td>1 Apr 2026<sup id="fnref:3:4" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></td>
      <td>gpt-5.4-mini</td>
    </tr>
    <tr>
      <td>GPT-5.2-Codex</td>
      <td>18 Dec 2025<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></td>
      <td>5 Jun 2026<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></td>
      <td>gpt-5.3-codex</td>
    </tr>
    <tr>
      <td>GPT-5.3-Codex</td>
      <td>5 Feb 2026<sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></td>
      <td>Current</td>
      <td>—</td>
    </tr>
    <tr>
      <td>GPT-5.3-Codex-Spark</td>
      <td>12 Feb 2026<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup></td>
      <td>Research preview</td>
      <td>—</td>
    </tr>
    <tr>
      <td>GPT-5.4</td>
      <td>5 Mar 2026<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup></td>
      <td>Current (recommended)</td>
      <td>—</td>
    </tr>
    <tr>
      <td>GPT-5.4-mini</td>
      <td>17 Mar 2026<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></td>
      <td>Current</td>
      <td>—</td>
    </tr>
    <tr>
      <td>GPT-5.4-nano</td>
      <td>17 Mar 2026<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></td>
      <td>Current</td>
      <td>—</td>
    </tr>
  </tbody>
</table>

<p>The April 1 deprecation wiped out the entire GPT-5.0 and GPT-5.1 Codex family in a single sweep.<sup id="fnref:3:5" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> The next deprecation wave — GPT-5.2-Codex on 5 June 2026 — is less than two months away.<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></p>

<h2 id="what-happens-when-a-model-is-deprecated">What Happens When a Model Is Deprecated</h2>

<p>When OpenAI deprecates a Codex model, the behaviour depends on your access method:</p>

<ol>
  <li>
    <p><strong>ChatGPT-authenticated users</strong> (the default for Codex CLI): the model silently disappears from the picker. If your <code class="language-plaintext highlighter-rouge">config.toml</code> still references it, Codex falls back to the current default model.<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup></p>
  </li>
  <li>
    <p><strong>API key users</strong>: requests to a deprecated model return an error. There is no automatic fallback — your pipeline breaks.</p>
  </li>
  <li>
    <p><strong>GitHub Copilot users</strong>: deprecated models are removed from all Copilot experiences including Chat, inline edits, and agent modes. Enterprise administrators must enable replacement models through Copilot settings policies.<sup id="fnref:3:6" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
  </li>
  <li>
    <p><strong>Azure OpenAI / Microsoft Foundry</strong>: Azure maintains its own retirement schedule which may lag behind or precede OpenAI’s by several weeks.<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup></p>
  </li>
</ol>

<h2 id="the-configtoml-migration-mechanism">The config.toml Migration Mechanism</h2>

<p>Codex CLI includes a built-in migration map for model names. When a deprecated model is referenced in configuration, Codex can recognise the old name and suggest or apply a replacement.<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup></p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># ~/.codex/config.toml — before migration</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.1-codex"</span>
</code></pre></div></div>

<p>After the April 1 deprecation, this configuration will either fall back to the default or fail, depending on your authentication method. The fix is straightforward:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># ~/.codex/config.toml — after migration</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>
</code></pre></div></div>

<h3 id="the-recommended-model-stack-april-2026">The recommended model stack (April 2026)</h3>

<p>For most workflows, OpenAI now recommends <code class="language-plaintext highlighter-rouge">gpt-5.4</code> as the default.<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup> Here is the current recommended stack:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># ~/.codex/config.toml</span>

<span class="c"># Primary model — GPT-5.4 unifies coding + reasoning + computer use</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>

<span class="c"># Review model — match or exceed your primary</span>
<span class="py">review_model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>

<span class="c"># Reasoning effort — adjust per task complexity</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"high"</span>
<span class="py">plan_mode_reasoning_effort</span> <span class="p">=</span> <span class="s">"xhigh"</span>
</code></pre></div></div>

<h2 id="profile-based-model-management">Profile-Based Model Management</h2>

<p>The profiles system (experimental, March 2026) is the cleanest way to manage multiple model configurations and prepare for deprecation waves.<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">13</a></sup> Define profiles that isolate model choices, so a single deprecation requires only one line change per affected profile.</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># ~/.codex/config.toml</span>

<span class="c"># Default profile</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>

<span class="nn">[profiles.fast]</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4-mini"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"low"</span>

<span class="nn">[profiles.deep]</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"xhigh"</span>
<span class="py">plan_mode_reasoning_effort</span> <span class="p">=</span> <span class="s">"xhigh"</span>

<span class="nn">[profiles.spark]</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.3-codex-spark"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"medium"</span>

<span class="nn">[profiles.legacy-52]</span>
<span class="c"># ⚠️ Retiring 5 June 2026 — migrate to gpt-5.3-codex or gpt-5.4</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.2-codex"</span>
</code></pre></div></div>

<p>Switch profiles on the command line:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Quick task with mini</span>
codex <span class="nt">--profile</span> fast <span class="s2">"add error handling to parse_config"</span>

<span class="c"># Deep architectural review</span>
codex <span class="nt">--profile</span> deep <span class="s2">"review the authentication module for security issues"</span>

<span class="c"># Real-time iteration with Spark</span>
codex <span class="nt">--profile</span> spark <span class="s2">"refactor this function step by step"</span>
</code></pre></div></div>

<h2 id="the-gpt-54-unification">The GPT-5.4 Unification</h2>

<p>GPT-5.4, released 5 March 2026, represents a significant architectural shift.<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> It is the first mainline reasoning model to incorporate the frontier coding capabilities previously exclusive to the Codex-specific model line. In practical terms:</p>

<ul>
  <li><strong>GPT-5.3-Codex</strong> remains the best pure coding model, scoring highest on SWE-bench Verified<sup id="fnref:2:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></li>
  <li><strong>GPT-5.4</strong> matches or exceeds GPT-5.3-Codex on coding while adding native computer use (75% OSWorld), stronger reasoning, and 1M token extended context<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup><sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">14</a></sup></li>
  <li><strong>GPT-5.4-mini</strong> delivers 54.4% on SWE-Bench Pro at 30% of the credit consumption of the flagship — purpose-built for subagents<sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></li>
</ul>

<pre><code class="language-mermaid">flowchart TD
    A[Task arrives] --&gt; B{Task complexity?}
    B --&gt;|Simple fix / subagent| C[gpt-5.4-mini]
    B --&gt;|Standard development| D[gpt-5.4]
    B --&gt;|Pure coding, max accuracy| E[gpt-5.3-codex]
    B --&gt;|Real-time iteration| F[gpt-5.3-codex-spark]

    C --&gt; G{Cost sensitive?}
    G --&gt;|Yes| H[gpt-5.4-nano]
    G --&gt;|No| C

    D --&gt; I[Default recommendation]
    E --&gt; J[Legacy Codex-line — still current]
    F --&gt; K[Pro subscribers only]

    style I fill:#2d6,stroke:#333,color:#fff
    style J fill:#26d,stroke:#333,color:#fff
    style K fill:#d62,stroke:#333,color:#fff
</code></pre>

<p>The question on many developers’ minds — raised publicly by Simon Willison — is whether the Codex model line will merge entirely into the mainline GPT series.<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">15</a></sup> The introduction of <code class="language-plaintext highlighter-rouge">gpt-5-codex</code> and <code class="language-plaintext highlighter-rouge">gpt-5-codex-mini</code> as unified model identifiers in late March 2026 suggests the answer is yes.<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">16</a></sup></p>

<h2 id="subagent-model-configuration-for-multi-agent-workflows">Subagent Model Configuration for Multi-Agent Workflows</h2>

<p>Deprecations hit hardest in multi-agent configurations where different agents may reference different models. With the April 2026 changes, audit every agent TOML file in <code class="language-plaintext highlighter-rouge">.codex/agents/</code>:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># .codex/agents/reviewer.toml — BEFORE (broken after April 1)</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.1-codex-max"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"xhigh"</span>

<span class="c"># .codex/agents/reviewer.toml — AFTER</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"xhigh"</span>
</code></pre></div></div>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># .codex/agents/worker.toml — BEFORE (broken after April 1)</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.1-codex-mini"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"medium"</span>

<span class="c"># .codex/agents/worker.toml — AFTER</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4-mini"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"medium"</span>
</code></pre></div></div>

<p>For the <code class="language-plaintext highlighter-rouge">[agents]</code> section controlling subagent defaults:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[agents]</span>
<span class="py">max_threads</span> <span class="p">=</span> <span class="mi">4</span>
<span class="py">max_depth</span> <span class="p">=</span> <span class="mi">2</span>
<span class="c"># Subagent model — use mini for cost efficiency</span>
<span class="c"># Previously gpt-5.1-codex-mini, now:</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4-mini"</span>
</code></pre></div></div>

<h2 id="cicd-pipeline-migration">CI/CD Pipeline Migration</h2>

<p>Pipelines using <code class="language-plaintext highlighter-rouge">codex exec</code> with explicit model flags are the most fragile. A deprecated model causes an immediate hard failure in CI.</p>

<h3 id="defensive-pattern-environment-variable-indirection">Defensive pattern: environment variable indirection</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># .github/workflows/codex-review.yml</span>
<span class="nb">env</span>:
  CODEX_MODEL: <span class="s2">"gpt-5.4"</span>
  CODEX_SUBAGENT_MODEL: <span class="s2">"gpt-5.4-mini"</span>

steps:
  - name: Run Codex review
    run: |
      codex <span class="nb">exec</span> <span class="se">\</span>
        <span class="nt">-c</span> <span class="nv">model</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CODEX_MODEL</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
        <span class="nt">-c</span> <span class="nv">review_model</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CODEX_MODEL</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
        <span class="s2">"Review all changed files for security issues"</span>
</code></pre></div></div>

<p>When the next deprecation arrives, update a single environment variable rather than hunting through workflow files.</p>

<h3 id="defensive-pattern-profile-based-ci">Defensive pattern: profile-based CI</h3>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># .codex/config.toml (committed to repo)</span>
<span class="nn">[profiles.ci]</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"high"</span>
<span class="py">approval_policy</span> <span class="p">=</span> <span class="s">"full-auto"</span>
<span class="py">sandbox_mode</span> <span class="p">=</span> <span class="s">"locked-network"</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex <span class="nb">exec</span> <span class="nt">--profile</span> ci <span class="s2">"run the test suite and fix failures"</span>
</code></pre></div></div>

<h2 id="the-june-2026-deprecation-preparing-now">The June 2026 Deprecation: Preparing Now</h2>

<p>GPT-5.2-Codex retires on 5 June 2026.<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> If you or your team still reference <code class="language-plaintext highlighter-rouge">gpt-5.2-codex</code> anywhere, here is a migration checklist:</p>

<ol>
  <li><strong>Audit all config files</strong>: <code class="language-plaintext highlighter-rouge">grep -r "gpt-5.2" ~/.codex/ .codex/ .codex/agents/</code></li>
  <li><strong>Check CI/CD</strong>: search workflow files for hardcoded model strings</li>
  <li><strong>Update AGENTS.md</strong>: if any agent instructions reference specific model names, update them</li>
  <li><strong>Test with the replacement</strong>: switch to <code class="language-plaintext highlighter-rouge">gpt-5.3-codex</code> or <code class="language-plaintext highlighter-rouge">gpt-5.4</code> and verify your workflows produce equivalent output</li>
  <li><strong>Update custom harnesses</strong>: any code using the Responses API with explicit model parameters needs updating</li>
  <li><strong>Notify the team</strong>: if you use project-scoped <code class="language-plaintext highlighter-rouge">.codex/config.toml</code>, push the model change as a PR</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Quick audit across a monorepo</span>
<span class="nb">grep</span> <span class="nt">-rn</span> <span class="s2">"gpt-5</span><span class="se">\.\(</span><span class="s2">1</span><span class="se">\|</span><span class="s2">2</span><span class="se">\)</span><span class="s2">"</span> <span class="se">\</span>
  ~/.codex/config.toml <span class="se">\</span>
  .codex/ <span class="se">\</span>
  .codex/agents/ <span class="se">\</span>
  .github/workflows/ <span class="se">\</span>
  2&gt;/dev/null
</code></pre></div></div>

<h2 id="azure-openai-considerations">Azure OpenAI Considerations</h2>

<p>Azure OpenAI maintains a separate retirement schedule through Microsoft Foundry.<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup> Key differences:</p>

<ul>
  <li>Azure deployments use deployment names, not model IDs — a deprecation requires redeploying, not just changing a string</li>
  <li>Azure retirements may lag behind OpenAI’s by weeks</li>
  <li>The <code class="language-plaintext highlighter-rouge">api-version</code> query parameter in your <code class="language-plaintext highlighter-rouge">[model_providers]</code> block must match the deployment’s API version</li>
</ul>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[model_providers.azure]</span>
<span class="py">base_url</span> <span class="p">=</span> <span class="s">"https://your-resource.openai.azure.com/openai"</span>
<span class="py">wire_api</span> <span class="p">=</span> <span class="s">"responses"</span>

<span class="nn">[model_providers.azure.query_params]</span>
<span class="py">api-version</span> <span class="p">=</span> <span class="s">"2026-03-01-preview"</span>
</code></pre></div></div>

<p>⚠️ Azure Entra ID token authentication with Codex CLI has a known limitation (issue #13241) — static API keys remain more reliable for automated workflows.</p>

<h2 id="best-practices-for-model-lifecycle-management">Best Practices for Model Lifecycle Management</h2>

<ol>
  <li><strong>Never hardcode model names in scripts</strong> — use config.toml profiles or environment variables</li>
  <li><strong>Pin to the recommended model</strong> (<code class="language-plaintext highlighter-rouge">gpt-5.4</code>) unless you have a specific reason not to</li>
  <li><strong>Subscribe to the changelog</strong> at <a href="https://developers.openai.com/codex/changelog">developers.openai.com/codex/changelog</a> and the <a href="https://github.blog/changelog/">GitHub Changelog</a> for deprecation notices</li>
  <li><strong>Test model changes in a branch</strong> before rolling out to the team</li>
  <li><strong>Use the <code class="language-plaintext highlighter-rouge">codex exec</code> structured output</strong> (<code class="language-plaintext highlighter-rouge">--output-schema</code>) to detect regressions when switching models</li>
  <li><strong>Keep subagent models one tier below the primary</strong> — <code class="language-plaintext highlighter-rouge">gpt-5.4-mini</code> for subagents, <code class="language-plaintext highlighter-rouge">gpt-5.4</code> for the orchestrator</li>
  <li><strong>Set calendar reminders</strong> for announced deprecation dates — the June 5 GPT-5.2 retirement is next</li>
</ol>

<h2 id="what-is-next">What Is Next</h2>

<p>The model identifier consolidation — with <code class="language-plaintext highlighter-rouge">gpt-5-codex</code> and <code class="language-plaintext highlighter-rouge">gpt-5-codex-mini</code> appearing as unified aliases in late March<sup id="fnref:16:1" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">16</a></sup> — suggests OpenAI may move toward rolling model identifiers that always point to the latest Codex-optimised model. If this happens, explicit version pinning would become opt-in rather than the default, significantly reducing deprecation churn.</p>

<p>Until then, treat model lifecycle management as a first-class operational concern. The eight-month pattern is clear: new Codex models arrive every 6–10 weeks, and old ones retire within 4–5 months. Plan accordingly.</p>

<hr />

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://developers.openai.com/api/docs/models/gpt-5-codex">GPT-5-Codex Model documentation</a> — OpenAI API reference, September 2025 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-3-codex/">Introducing GPT-5.3-Codex</a> — OpenAI blog, 5 February 2026 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:2:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://github.blog/changelog/2026-04-03-gpt-5-1-codex-gpt-5-1-codex-max-and-gpt-5-1-codex-mini-deprecated/">GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated</a> — GitHub Changelog, 3 April 2026 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:3:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:3:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:3:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:3:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://openai.com/index/gpt-5-1-codex-max/">Building more with GPT-5.1-Codex-Max</a> — OpenAI blog, 19 November 2025 <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:4:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-2-codex/">Introducing GPT-5.2-Codex</a> — OpenAI blog, 18 December 2025 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://openai.com/index/retiring-gpt-4o-and-older-models/">Retiring GPT-4o and older models</a> — OpenAI blog, February 2026; GPT-5.2 Thinking retires 5 June 2026 <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/changelog">Codex CLI Changelog — Codex-Spark research preview</a> — OpenAI Developers, February 2026 <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-4/">Introducing GPT-5.4</a> — OpenAI blog, 5 March 2026 <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/">Introducing GPT-5.4 mini and nano</a> — OpenAI blog, 17 March 2026 <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/models">Codex Models documentation</a> — OpenAI Developers, current <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p><a href="https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/model-retirements">Azure OpenAI Model Retirements</a> — Microsoft Learn, current <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/config-reference">Codex Configuration Reference</a> — OpenAI Developers, current <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/config-sample">Codex Sample Configuration</a> — OpenAI Developers, current <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:14" role="doc-endnote">
      <p><a href="https://www.nxcode.io/resources/news/gpt-5-4-complete-guide-features-pricing-models-2026">GPT-5.4 Complete Guide 2026</a> — NxCode, March 2026 <a href="#fnref:14" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:15" role="doc-endnote">
      <p><a href="https://openai.com/index/introducing-gpt-5-4/">GPT-5.4 discussion — Simon Willison’s question on Codex model line merger</a> — referenced in community discussion, March 2026 <a href="#fnref:15" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:16" role="doc-endnote">
      <p><a href="https://danielvaughan.com/codex-resources/articles/2026-03-30-gpt-5-codex-new-flagship-model-guide/">gpt-5-codex: The New Codex Flagship and What It Means for Your Workflow</a> — Daniel Vaughan / Codex Resources, 30 March 2026 <a href="#fnref:16" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:16:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Codex CLI Model Lifecycle: Navigating Deprecations, Migrations, and the GPT-5.x Transition]]></summary></entry><entry><title type="html">Codified Context: The Three-Tier Knowledge Architecture for AI Coding Agents</title><link href="https://codex.danielvaughan.com/2026/04/07/codified-context-three-tier-knowledge-architecture/" rel="alternate" type="text/html" title="Codified Context: The Three-Tier Knowledge Architecture for AI Coding Agents" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/codified-context-three-tier-knowledge-architecture</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/codified-context-three-tier-knowledge-architecture/"><![CDATA[<h1 id="codified-context-the-three-tier-knowledge-architecture-for-ai-coding-agents">Codified Context: The Three-Tier Knowledge Architecture for AI Coding Agents</h1>

<hr />

<p>Dumping everything into a single <code class="language-plaintext highlighter-rouge">AGENTS.md</code> file works until it doesn’t. At some point—typically around 20,000 lines of codebase—you hit the context wall: the constitution grows unwieldy, the agent forgets domain nuances, and you find yourself re-explaining the same architectural constraints every session. Aristidis Vasilopoulos’s February 2026 paper, <em>Codified Context: Infrastructure for AI Agents in a Complex Codebase</em> <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, offers a rigorous, empirically-validated alternative: a three-tier knowledge architecture that maps cleanly onto Codex CLI’s existing primitives.</p>

<p>This article unpacks the paper’s findings, maps them to Codex CLI’s current feature set, and provides concrete implementation patterns.</p>

<h2 id="the-three-tier-model">The Three-Tier Model</h2>

<p>The core insight is straightforward: not all context is equal. Some knowledge must be present in every session (hot memory), some is needed only for specific task types (warm, specialist knowledge), and some is referenced rarely but must be queryable on demand (cold memory). The paper formalises this into three tiers <sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<pre><code class="language-mermaid">graph TD
    A[Human Prompt] --&gt; B{Session Start}
    B --&gt; C[Tier 1: Constitution&lt;br/&gt;Always Loaded]
    C --&gt; D{Task Classification}
    D --&gt; E[Tier 2: Specialist Agent&lt;br/&gt;Invoked Per Task]
    D --&gt; F[Tier 2: Another Specialist&lt;br/&gt;Invoked Per Task]
    E --&gt; G{Need Reference Data?}
    F --&gt; G
    G --&gt;|Yes| H[Tier 3: MCP Knowledge Server&lt;br/&gt;Queried On Demand]
    G --&gt;|No| I[Execute Task]
    H --&gt; I
</code></pre>

<table>
  <thead>
    <tr>
      <th>Tier</th>
      <th>Role</th>
      <th>Codex CLI Mapping</th>
      <th>Files</th>
      <th>Lines</th>
      <th>% of Codebase</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>T1</td>
      <td>Constitution (Hot Memory)</td>
      <td><code class="language-plaintext highlighter-rouge">AGENTS.md</code></td>
      <td>1</td>
      <td>~660</td>
      <td>0.6%</td>
    </tr>
    <tr>
      <td>T2</td>
      <td>Specialist Agents (Warm)</td>
      <td><code class="language-plaintext highlighter-rouge">.codex/agents/*.toml</code></td>
      <td>19</td>
      <td>~9,300</td>
      <td>8.6%</td>
    </tr>
    <tr>
      <td>T3</td>
      <td>Knowledge Base (Cold Memory)</td>
      <td>MCP knowledge servers</td>
      <td>34</td>
      <td>~16,250</td>
      <td>15.0%</td>
    </tr>
    <tr>
      <td><strong>Total</strong></td>
      <td> </td>
      <td> </td>
      <td><strong>54</strong></td>
      <td><strong>~26,200</strong></td>
      <td><strong>24.2%</strong></td>
    </tr>
  </tbody>
</table>

<p>The metrics come from a real 108,000-line C# distributed system tracked across 283 development sessions and 2,801 human prompts <sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Crucially, Vasilopoulos warns that the 24.2% context infrastructure ratio reflects this project’s complexity and domain—it is not a universal target <sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="tier-1-the-constitution-agentsmd">Tier 1: The Constitution (AGENTS.md)</h2>

<p>The constitution is the only file that loads into every session. It defines non-negotiable rules: coding standards, architectural boundaries, forbidden patterns, and the trigger table that routes tasks to specialist agents.</p>

<p>In Codex CLI, this maps directly to <code class="language-plaintext highlighter-rouge">AGENTS.md</code> <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Since February 2026, AGENTS.md is an open standard under the Linux Foundation’s Agentic AI Foundation, readable by Codex, Cursor, Copilot, Amp, Windsurf, and Gemini CLI <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Codex loads AGENTS.md from both <code class="language-plaintext highlighter-rouge">~/.codex/</code> (global) and per-directory (repo-scoped), with closer files taking precedence <sup id="fnref:2:1" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>

<p>A well-structured constitution for a tiered architecture includes the trigger table directly:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># AGENTS.md</span>

<span class="gu">## Routing Rules</span>

When the task involves <span class="gs">**network protocols or sync logic**</span>, delegate to
the <span class="sb">`network-protocol-designer`</span> agent before making changes.

When the task involves <span class="gs">**coordinates, camera, or spatial transforms**</span>,
delegate to the <span class="sb">`coordinate-wizard`</span> agent.

After any structural change, invoke the <span class="sb">`code-reviewer-game-dev`</span> agent
for review.

<span class="gu">## Architectural Boundaries</span>
<span class="p">
-</span> ECS components MUST NOT hold references to MonoBehaviours
<span class="p">-</span> Network messages MUST be defined in the shared assembly
<span class="p">-</span> All coordinate transforms go through CoordinateService
</code></pre></div></div>

<p>Research by Santos et al. found that well-structured AGENTS.md files correlate with a 29% reduction in median runtime and 17% reduction in output token consumption <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h2 id="tier-2-specialist-agents">Tier 2: Specialist Agents</h2>

<p>This is where the paper’s approach diverges from the “one massive context file” pattern. Rather than cramming domain knowledge into the constitution, each specialist area gets its own agent definition with focused expertise.</p>

<p>In Codex CLI, custom agents live in <code class="language-plaintext highlighter-rouge">.codex/agents/</code> (project-scoped) or <code class="language-plaintext highlighter-rouge">~/.codex/agents/</code> (personal) as TOML files <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. Subagents and custom agents reached GA on 16 March 2026 <sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># .codex/agents/network-protocol-designer.toml</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"network-protocol-designer"</span>
<span class="py">model</span> <span class="p">=</span> <span class="s">"GPT-5.4"</span>
<span class="py">model_reasoning_effort</span> <span class="p">=</span> <span class="s">"high"</span>
<span class="py">sandbox_mode</span> <span class="p">=</span> <span class="s">"read-only"</span>

<span class="nn">[instructions]</span>
<span class="py">content</span> <span class="p">=</span> <span class="s">"""
You are the network protocol specialist for ProjectX.
Key constraints:
- All messages use the NetworkMessage base class
- Serialisation uses MessagePack, never JSON
- Maximum message size: 512 bytes
- Tick rate: 20Hz server, 60Hz client interpolation
- See specs/network-protocol-v3.md for the full wire format
"""</span>
</code></pre></div></div>

<p>The paper’s 108K-line project used 19 specialist agents. Across 757 classifiable agent invocations, 432 (57%) went to project-specific specialists rather than built-in tool agents <sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The most frequently invoked were the code reviewer (154 invocations) and the network-protocol-designer (85 invocations) <sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h3 id="the-trigger-table-pattern">The Trigger Table Pattern</h3>

<p>The paper formalises task routing through a trigger table—a mapping from signals in the human prompt to the appropriate specialist <sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Trigger Phase</th>
      <th>Signal</th>
      <th>Agent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Pre-change</td>
      <td>Network, sync</td>
      <td>network-protocol-designer</td>
    </tr>
    <tr>
      <td>Pre-change</td>
      <td>Coordinates, camera</td>
      <td>coordinate-wizard</td>
    </tr>
    <tr>
      <td>Pre-change</td>
      <td>Abilities end-to-end</td>
      <td>ability-designer</td>
    </tr>
    <tr>
      <td>Post-change</td>
      <td>Architecture, design</td>
      <td>systems-designer</td>
    </tr>
    <tr>
      <td>Post-change</td>
      <td>ECS or network files</td>
      <td>code-reviewer-game-dev</td>
    </tr>
  </tbody>
</table>

<p>In practice, you encode this in your AGENTS.md (Tier 1) and rely on the model to follow the routing. Note that Codex CLI does not currently auto-spawn custom subagents—explicit delegation prompts are required <sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. There is an open issue (#14161) regarding <code class="language-plaintext highlighter-rouge">[[skills.config]]</code> in agent TOML being ignored for sub-agents <sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>

<h2 id="tier-3-mcp-knowledge-servers">Tier 3: MCP Knowledge Servers</h2>

<p>Cold memory—specification documents, API references, wire format definitions—lives behind MCP (Model Context Protocol) servers. These are queried on demand rather than loaded into every session, keeping the base context window lean.</p>

<p>Codex CLI treats MCP as a first-class citizen <sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. Configuration lives in <code class="language-plaintext highlighter-rouge">.codex/config.toml</code>:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># .codex/config.toml</span>
<span class="nn">[mcp_servers.knowledge-base]</span>
<span class="py">type</span> <span class="p">=</span> <span class="s">"stdio"</span>
<span class="py">command</span> <span class="p">=</span> <span class="s">"node"</span>
<span class="py">args</span> <span class="p">=</span> <span class="nn">["./mcp-servers/knowledge-retriever/index.js"]</span>

<span class="nn">[mcp_servers.specs-server]</span>
<span class="py">type</span> <span class="p">=</span> <span class="s">"http"</span>
<span class="py">url</span> <span class="p">=</span> <span class="s">"http://localhost:3001/mcp"</span>
</code></pre></div></div>

<p>MCP servers are managed via <code class="language-plaintext highlighter-rouge">codex mcp add</code>, <code class="language-plaintext highlighter-rouge">codex mcp list</code>, and <code class="language-plaintext highlighter-rouge">codex mcp login</code> <sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. Servers launch automatically when a session starts and support both STDIO and streaming HTTP transports <sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</p>

<p>The paper’s companion repository provides a reference MCP retrieval server that exposes two key tools <sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">find_relevant_context(task)</code> — returns matching specification fragments</li>
  <li><code class="language-plaintext highlighter-rouge">suggest_agent(task)</code> — recommends the appropriate Tier 2 specialist</li>
</ul>

<pre><code class="language-mermaid">sequenceDiagram
    participant H as Human
    participant C as Codex CLI
    participant A as AGENTS.md (T1)
    participant S as Specialist Agent (T2)
    participant M as MCP Server (T3)

    H-&gt;&gt;C: "Refactor network handshake"
    C-&gt;&gt;A: Load constitution
    A--&gt;&gt;C: Route to network-protocol-designer
    C-&gt;&gt;S: Spawn specialist agent
    S-&gt;&gt;M: find_relevant_context("network handshake")
    M--&gt;&gt;S: specs/network-protocol-v3.md (relevant sections)
    S--&gt;&gt;C: Proposed changes
    C-&gt;&gt;A: Route to code-reviewer-game-dev
    Note over C: Post-change review trigger
</code></pre>

<h2 id="practical-implementation">Practical Implementation</h2>

<h3 id="bootstrapping-the-architecture">Bootstrapping the Architecture</h3>

<p>The companion repository includes three factory agents for bootstrapping the tier infrastructure in an existing project <sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>:</p>

<ol>
  <li><strong>Constitution Generator</strong> — analyses the codebase and drafts an initial AGENTS.md</li>
  <li><strong>Agent Extractor</strong> — identifies domain clusters and generates specialist TOML files</li>
  <li><strong>Knowledge Indexer</strong> — catalogues specification documents for MCP serving</li>
</ol>

<h3 id="maintenance-budget">Maintenance Budget</h3>

<p>The paper reports a maintenance overhead of approximately 1–2 hours per week: a biweekly review pass of 30–45 minutes each <sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Meta-infrastructure prompts—those specifically about building and maintaining the knowledge architecture itself—accounted for just 4.3% of substantive prompts <sup id="fnref:1:8" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h3 id="prompt-efficiency">Prompt Efficiency</h3>

<p>A striking finding: over 80% of human prompts in the study were 100 words or fewer <sup id="fnref:1:9" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The tiered architecture front-loads context so thoroughly that terse prompts suffice. This aligns with the broader principle that good context engineering reduces prompt engineering effort.</p>

<h2 id="connecting-to-the-agentic-pod-pattern">Connecting to the Agentic Pod Pattern</h2>

<p>The three-tier model maps naturally onto the emerging agentic pod architecture, where multiple AI agents collaborate on a shared codebase:</p>

<pre><code class="language-mermaid">graph LR
    subgraph Pod
        O[Orchestrator] --&gt; A1[Specialist: Architecture]
        O --&gt; A2[Specialist: Testing]
        O --&gt; A3[Specialist: Security]
        O --&gt; A4[Specialist: Domain Expert]
    end
    A1 &amp; A2 &amp; A3 &amp; A4 --&gt; KB[MCP Knowledge Servers]
    O --&gt; CONST[AGENTS.md Constitution]
</code></pre>

<p>Each pod member is effectively a Tier 2 specialist, the shared constitution (Tier 1) ensures consistency, and MCP servers (Tier 3) provide the shared reference library. The Codex CLI subagent system supports spawning specialists in parallel and collecting results <sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>, making this pattern directly implementable today.</p>

<h2 id="current-model-considerations">Current Model Considerations</h2>

<p>When configuring Tier 2 agents, note the current model landscape. As of April 2026, <strong>GPT-5.4</strong> is the recommended default model, combining coding, reasoning, and native computer use <sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. The GPT-5.1-Codex family was deprecated on 3 April 2026 <sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>. GPT-5.3-Codex and GPT-5.2-Codex remain available for specific use cases <sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. Authentication now primarily uses “Sign in with ChatGPT” rather than API keys <sup id="fnref:10:2" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</p>

<h2 id="key-takeaways">Key Takeaways</h2>

<ol>
  <li><strong>Separate hot, warm, and cold context</strong> — not everything belongs in AGENTS.md</li>
  <li><strong>The trigger table is the glue</strong> — encode routing rules in Tier 1, domain knowledge in Tier 2</li>
  <li><strong>MCP servers keep the context window lean</strong> — query specifications on demand, don’t pre-load them</li>
  <li><strong>57% specialist usage</strong> validates the investment in domain-specific agents <sup id="fnref:1:10" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></li>
  <li><strong>1–2 hours per week</strong> is a realistic maintenance budget for a complex project <sup id="fnref:1:11" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></li>
  <li><strong>24.2% is not a target</strong> — measure your own ratio and adjust to your project’s needs</li>
</ol>

<p>The paper’s companion repository at <a href="https://github.com/arisvas4/codified-context-infrastructure">github.com/arisvas4/codified-context-infrastructure</a> provides a complete reference implementation <sup id="fnref:9:2" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>.</p>

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Vasilopoulos, A. (2026). “Codified Context: Infrastructure for AI Agents in a Complex Codebase.” arXiv:2602.20478v1. <a href="https://arxiv.org/abs/2602.20478">https://arxiv.org/abs/2602.20478</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:1:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a> <a href="#fnref:1:8" class="reversefootnote" role="doc-backlink">&#8617;<sup>9</sup></a> <a href="#fnref:1:9" class="reversefootnote" role="doc-backlink">&#8617;<sup>10</sup></a> <a href="#fnref:1:10" class="reversefootnote" role="doc-backlink">&#8617;<sup>11</sup></a> <a href="#fnref:1:11" class="reversefootnote" role="doc-backlink">&#8617;<sup>12</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>OpenAI. “Codex CLI — AGENTS.md Guide.” <a href="https://developers.openai.com/codex/guides/agents-md">https://developers.openai.com/codex/guides/agents-md</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:2:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Linux Foundation Agentic AI Foundation. AGENTS.md open standard. Referenced in: “AGENTS.md: The Open Standard for Cross-Tool AI Agent Portability.” <a href="https://developers.openai.com/codex/guides/agents-md">https://developers.openai.com/codex/guides/agents-md</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Santos, R. et al. Referenced in Substack analysis: “Scaling your coding agent’s context beyond a single AGENTS.md-file.” <a href="https://ursula8sciform.substack.com/p/scaling-your-coding-agents-context">https://ursula8sciform.substack.com/p/scaling-your-coding-agents-context</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>OpenAI. “Codex CLI — Subagents.” <a href="https://developers.openai.com/codex/subagents">https://developers.openai.com/codex/subagents</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Simon Willison. “Codex Subagents.” 16 March 2026. <a href="https://simonwillison.net/2026/Mar/16/codex-subagents/">https://simonwillison.net/2026/Mar/16/codex-subagents/</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>GitHub issue #14161 — <code class="language-plaintext highlighter-rouge">[[skills.config]]</code> in agent TOML ignored for sub-agents. <a href="https://github.com/openai/codex/issues/14161">https://github.com/openai/codex/issues/14161</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>OpenAI. “Codex CLI — MCP Integration.” <a href="https://developers.openai.com/codex/mcp">https://developers.openai.com/codex/mcp</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>Vasilopoulos, A. Codified Context Infrastructure — companion repository. <a href="https://github.com/arisvas4/codified-context-infrastructure">https://github.com/arisvas4/codified-context-infrastructure</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:9:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>OpenAI. “Codex CLI — Models.” <a href="https://developers.openai.com/codex/models">https://developers.openai.com/codex/models</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:10:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>GitHub Blog. “GPT-5.1-Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated.” 3 April 2026. <a href="https://github.blog/changelog/2026-04-03-gpt-5-1-codex-gpt-5-1-codex-max-and-gpt-5-1-codex-mini-deprecated/">https://github.blog/changelog/2026-04-03-gpt-5-1-codex-gpt-5-1-codex-max-and-gpt-5-1-codex-mini-deprecated/</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Codified Context: The Three-Tier Knowledge Architecture for AI Coding Agents]]></summary></entry><entry><title type="html">Automating the Cross-Model Review Loop: Three Levels from SKILL.md to Multi-AI Pipeline</title><link href="https://codex.danielvaughan.com/2026/04/07/cross-model-review-loop-automation/" rel="alternate" type="text/html" title="Automating the Cross-Model Review Loop: Three Levels from SKILL.md to Multi-AI Pipeline" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/cross-model-review-loop-automation</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/cross-model-review-loop-automation/"><![CDATA[<h1 id="automating-the-cross-model-review-loop-three-levels-from-skillmd-to-multi-ai-pipeline">Automating the Cross-Model Review Loop: Three Levels from SKILL.md to Multi-AI Pipeline</h1>

<hr />

<p>The cross-model review pattern — where one AI writes code and a structurally different AI reviews it — has become a core quality practice in agentic development. Claude Code and Codex CLI have different training distributions and different blind spots, making their disagreements genuinely informative<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. By late March 2026, the ecosystem offers three distinct automation tiers, each trading setup complexity for hands-off operation. This article walks through all three, with concrete configuration and the security caveats you need to understand before deploying them.</p>

<h2 id="why-cross-model-review-works">Why Cross-Model Review Works</h2>

<p>Single-model review suffers from sycophancy bias: the same system that wrote the code tends to approve it<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. Cross-provider review sidesteps this because Claude and GPT-5.x have fundamentally different failure modes. When both models flag the same issue, confidence is high. When only one flags it, that disagreement is the signal worth investigating — the “two doctors, same patient” heuristic<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<p>The standard execution path uses <code class="language-plaintext highlighter-rouge">codex exec</code> in non-interactive mode with a read-only sandbox, ensuring the reviewer cannot modify the codebase it is assessing<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex <span class="nb">exec</span> <span class="nt">-m</span> gpt-5.3-codex <span class="nt">-s</span> read-only <span class="s2">"Review the following diff for bugs, security issues, and style violations: </span><span class="si">$(</span>git diff HEAD~1<span class="si">)</span><span class="s2">"</span>
</code></pre></div></div>

<h2 id="level-1-skillmd--manual-trigger-minimal-setup">Level 1: SKILL.md — Manual Trigger, Minimal Setup</h2>

<p>A SKILL.md file is a single Markdown document placed in <code class="language-plaintext highlighter-rouge">.claude/skills/</code> that any LLM agent can parse<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. This is the lowest-friction entry point: no plugins, no hooks, no external dependencies beyond a working <code class="language-plaintext highlighter-rouge">codex</code> binary.</p>

<h3 id="directory-structure">Directory Structure</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.claude/
  skills/
    codex-review/
      SKILL.md
</code></pre></div></div>

<h3 id="the-review-loop">The Review Loop</h3>

<p>The SKILL.md defines a <code class="language-plaintext highlighter-rouge">/codex-review</code> slash command that executes a sequential fix loop:</p>

<pre><code class="language-mermaid">flowchart TD
    A["/codex-review invoked"] --&gt; B["Export current plan/diff"]
    B --&gt; C["codex exec read-only review"]
    C --&gt; D{"Verdict?"}
    D --&gt;|PASS| E["Review complete"]
    D --&gt;|CONCERNS| F["Claude addresses findings"]
    F --&gt; G{"Round &lt; 5?"}
    G --&gt;|Yes| C
    G --&gt;|No| H["Escalate to human"]
</code></pre>

<p>Each round uses a UUID-bound session ID for concurrency safety, and the review runs under <code class="language-plaintext highlighter-rouge">--sandbox read-only</code> to enforce immutability<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The key <code class="language-plaintext highlighter-rouge">codex exec</code> invocations:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Initial review</span>
codex <span class="nb">exec</span> <span class="nt">-m</span> gpt-5.3-codex <span class="nt">-s</span> read-only <span class="se">\</span>
  <span class="s2">"Review this plan against the codebase. Respond PASS or CONCERNS with details."</span>

<span class="c"># Re-review after fixes (resume session for context continuity)</span>
codex <span class="nb">exec </span>resume &lt;session-id&gt; <span class="se">\</span>
  <span class="s2">"Re-review the updated plan. Previous concerns were: ..."</span>
</code></pre></div></div>

<h3 id="level-15-fresh-session-audit">Level 1.5: Fresh-Session Audit</h3>

<p>A refinement worth adopting early: after the fix loop converges, spawn a fresh Codex session for a final audit<sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. This eliminates context bias from the iterative conversation and catches systemic issues the loop might have normalised. The audit uses a distinct verdict format — <code class="language-plaintext highlighter-rouge">AUDIT: PASS</code> or <code class="language-plaintext highlighter-rouge">AUDIT: CONCERNS</code> — to differentiate it from loop rounds.</p>

<p><strong>When to use Level 1:</strong> Solo developers or small teams wanting to validate the cross-model approach before investing in automation infrastructure. Setup time is under five minutes.</p>

<h2 id="level-2-stop-hook-plugins--automatic-trigger">Level 2: Stop Hook Plugins — Automatic Trigger</h2>

<p>Level 2 eliminates the manual <code class="language-plaintext highlighter-rouge">/codex-review</code> invocation by hooking into Codex CLI’s lifecycle system. When Claude Code attempts to complete a turn, a Stop hook intercepts the exit and triggers a Codex review automatically<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<h3 id="how-codex-hooks-work">How Codex Hooks Work</h3>

<p>Hooks are defined in <code class="language-plaintext highlighter-rouge">hooks.json</code> at user level (<code class="language-plaintext highlighter-rouge">~/.codex/hooks.json</code>) or repository level (<code class="language-plaintext highlighter-rouge">&lt;repo&gt;/.codex/hooks.json</code>)<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. The Stop hook fires at conversation turn completion:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"hooks"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"Stop"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
      </span><span class="p">{</span><span class="w">
        </span><span class="nl">"hooks"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
          </span><span class="p">{</span><span class="w">
            </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"command"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"command"</span><span class="p">:</span><span class="w"> </span><span class="s2">".claude-plugin/hooks/stop-hook.sh"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"statusMessage"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Running cross-model review..."</span><span class="p">,</span><span class="w">
            </span><span class="nl">"timeout"</span><span class="p">:</span><span class="w"> </span><span class="mi">900</span><span class="w">
          </span><span class="p">}</span><span class="w">
        </span><span class="p">]</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">]</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The hook communicates its decision via exit codes<sup id="fnref:5:1" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>:</p>

<ul>
  <li><strong>Exit 0</strong> with JSON <code class="language-plaintext highlighter-rouge">{"decision": "block", "reason": "..."}</code> — blocks the stop, feeds the reason back as a continuation prompt</li>
  <li><strong>Exit 0</strong> without blocking JSON — permits the stop</li>
  <li><strong>Exit 2</strong> — blocks; reads reason from stderr</li>
</ul>

<h3 id="option-a-codex-plugin-cc-official">Option A: codex-plugin-cc (Official)</h3>

<p>OpenAI released <code class="language-plaintext highlighter-rouge">codex-plugin-cc</code> on 30 March 2026<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>, providing a single-command review gate:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install</span>
/plugin marketplace add openai/codex-plugin-cc
/plugin <span class="nb">install </span>codex@openai-codex
/codex:setup

<span class="c"># Enable automatic review gate</span>
/codex:setup <span class="nt">--enable-review-gate</span>
</code></pre></div></div>

<p>When enabled, every Claude Code turn completion triggers a targeted Codex review. If issues are found, the stop is blocked and Claude addresses the findings before the turn can end<sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. The plugin also exposes manual commands:</p>

<table>
  <thead>
    <tr>
      <th>Command</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/codex:review --base main</code></td>
      <td>Diff review against a branch</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/codex:adversarial-review</code></td>
      <td>Devil’s advocate design challenge</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/codex:rescue --background</code></td>
      <td>Delegate a task to Codex asynchronously</td>
    </tr>
  </tbody>
</table>

<p>⚠️ <strong>Cost warning:</strong> The review gate can create long-running loops that rapidly consume usage limits. OpenAI’s own documentation recommends enabling it only under human supervision<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h3 id="option-b-claude-review-loop-community">Option B: claude-review-loop (Community)</h3>

<p>The <code class="language-plaintext highlighter-rouge">claude-review-loop</code> plugin by Hamel Husain takes a more opinionated approach, spawning up to four parallel Codex sub-agents based on project type<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Sub-Agent</th>
      <th>Trigger</th>
      <th>Focus</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Diff Review</td>
      <td>Always</td>
      <td>Code quality, tests, OWASP Top 10</td>
    </tr>
    <tr>
      <td>Holistic Review</td>
      <td>Always</td>
      <td>Architecture, documentation</td>
    </tr>
    <tr>
      <td>Next.js Review</td>
      <td><code class="language-plaintext highlighter-rouge">next.config.*</code> present</td>
      <td>App Router, Server Components, caching</td>
    </tr>
    <tr>
      <td>UX Review</td>
      <td>Frontend code detected</td>
      <td>Browser E2E via agent-browser, accessibility</td>
    </tr>
  </tbody>
</table>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install</span>
/plugin marketplace add hamelsmu/claude-review-loop
/plugin <span class="nb">install </span>review-loop@hamel-review
</code></pre></div></div>

<p>Codex deduplicates findings across agents and writes consolidated output to <code class="language-plaintext highlighter-rouge">reviews/review-&lt;id&gt;.md</code><sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. State is tracked in <code class="language-plaintext highlighter-rouge">.claude/review-loop.local.md</code> (gitignored).</p>

<h3 id="security-the-bypass-sandbox-default">Security: The bypass-sandbox Default</h3>

<p>Both community plugins default to <code class="language-plaintext highlighter-rouge">--dangerously-bypass-approvals-and-sandbox</code> for Codex execution<sup id="fnref:7:2" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. This is necessary because the review agents need file-system read access, but it means Codex runs without sandbox constraints. Override this with:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">REVIEW_LOOP_CODEX_FLAGS</span><span class="o">=</span><span class="s2">"--sandbox read-only"</span>
</code></pre></div></div>

<p>For <code class="language-plaintext highlighter-rouge">codex-plugin-cc</code>, the official plugin uses the Codex app server which applies its own sandbox policy, making this less of a concern<sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>

<h3 id="preventing-infinite-loops">Preventing Infinite Loops</h3>

<p>A critical implementation detail: your stop hook must check a <code class="language-plaintext highlighter-rouge">stop_hook_active</code> flag before spawning another review<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Without this guard, the review’s own completion triggers another stop hook, creating an infinite loop:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="nv">STATE_FILE</span><span class="o">=</span><span class="s2">".claude/review-loop.local.md"</span>
<span class="k">if </span><span class="nb">grep</span> <span class="nt">-q</span> <span class="s2">"stop_hook_active: true"</span> <span class="s2">"</span><span class="nv">$STATE_FILE</span><span class="s2">"</span> 2&gt;/dev/null<span class="p">;</span> <span class="k">then
  </span><span class="nb">exit </span>0  <span class="c"># Permit stop — we're already in a review cycle</span>
<span class="k">fi</span>
</code></pre></div></div>

<h2 id="level-3-multi-ai-pipeline-governance">Level 3: Multi-AI Pipeline Governance</h2>

<p>Level 3 moves beyond a single reviewer to orchestrated multi-model pipelines where different AI systems handle distinct quality dimensions.</p>

<h3 id="claude-codex-sequential-review-chain">claude-codex: Sequential Review Chain</h3>

<p>The <code class="language-plaintext highlighter-rouge">claude-codex</code> plugin (Z-M-Huang) implements a three-reviewer pipeline<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>:</p>

<pre><code class="language-mermaid">flowchart LR
    A["Implementation\n(Claude Sonnet)"] --&gt; B["Review 1\n(Claude Sonnet)"]
    B --&gt; C["Review 2\n(Claude Opus)"]
    C --&gt; D["Final Gate\n(Codex CLI)"]
    D --&gt;|Pass| E["Approved"]
    D --&gt;|Fail| F["Fix + Re-review"]
</code></pre>

<p>Each reviewer independently validates against OWASP Top 10 vulnerabilities<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. The pipeline enforces sequential dependencies via <code class="language-plaintext highlighter-rouge">blockedBy</code> constraints — Review 2 cannot start until Review 1 approves. If any reviewer requests changes, a fix task and re-review are automatically created.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Feature development with full pipeline</span>
/claude-codex:multi-ai Add rate limiting to the authentication endpoint

<span class="c"># Bug fix with dual root-cause analysis</span>
/claude-codex:bug-fix Session tokens not invalidated on password change
</code></pre></div></div>

<p>Configuration controls iteration limits<sup id="fnref:8:2" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>:</p>

<ul>
  <li>Plan review loop: 10 iterations maximum</li>
  <li>Code review loop: 15 iterations maximum</li>
  <li>Auto-resolve attempts: 3 retries before pausing for human input</li>
</ul>

<p>⚠️ Note: This repository was archived on 22 February 2026; development continues at <code class="language-plaintext highlighter-rouge">Z-M-Huang/vcp/plugins/dev-buddy</code><sup id="fnref:8:3" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</p>

<h3 id="github-agent-hq-platform-level-integration">GitHub Agent HQ: Platform-Level Integration</h3>

<p>GitHub’s Agent HQ, in public preview since February 2026, achieves platform-level cross-model integration<sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. From a single issue, you can launch Copilot, Claude Code, and Codex agents simultaneously, comparing their outputs. This requires Copilot Pro+ or Enterprise licensing.</p>

<h3 id="mapping-to-agentic-pod-roles">Mapping to Agentic Pod Roles</h3>

<p>The three levels map naturally to agentic pod structures<sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Level</th>
      <th>Pod Role Equivalent</th>
      <th>Team Size</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Level 1 (SKILL.md)</td>
      <td>Solo developer self-review</td>
      <td>1–2</td>
    </tr>
    <tr>
      <td>Level 2 (Stop Hook)</td>
      <td>Quality Engineer in the loop</td>
      <td>3–8</td>
    </tr>
    <tr>
      <td>Level 3 (Pipeline)</td>
      <td>Full pod with dedicated QA</td>
      <td>8+</td>
    </tr>
  </tbody>
</table>

<h2 id="choosing-your-level">Choosing Your Level</h2>

<pre><code class="language-mermaid">flowchart TD
    A["Starting cross-model review?"] --&gt; B{"Team size?"}
    B --&gt;|"Solo / pair"| C["Level 1: SKILL.md\n5 min setup"]
    B --&gt;|"Small team"| D{"Want automatic triggers?"}
    D --&gt;|Yes| E["Level 2: Stop Hook\ncodex-plugin-cc or\nclaude-review-loop"]
    D --&gt;|No| C
    B --&gt;|"Large team / enterprise"| F["Level 3: Pipeline\nclaude-codex or\nGitHub Agent HQ"]
    E --&gt; G{"Need multi-reviewer?"}
    G --&gt;|Yes| F
    G --&gt;|No| E
</code></pre>

<p>Start with Level 1 to validate that cross-model review catches real issues in your codebase. Promote to Level 2 when you find yourself routinely forgetting to invoke the review. Graduate to Level 3 when your team needs formalised quality gates with audit trails.</p>

<h2 id="practical-recommendations">Practical Recommendations</h2>

<ol>
  <li><strong>Always enforce read-only sandbox</strong> for review agents. A reviewer that can modify code is a reviewer that can mask its own findings.</li>
  <li><strong>Set explicit timeouts.</strong> The default 900-second timeout for stop hooks is generous; most reviews complete in under 60 seconds. Reduce to 120 seconds to fail fast on stuck sessions.</li>
  <li><strong>Monitor token consumption.</strong> Level 2 and 3 multiply your API usage significantly. Use <code class="language-plaintext highlighter-rouge">--model gpt-5.4-mini</code> for routine reviews and reserve full models for adversarial passes<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</li>
  <li><strong>Git-ignore review state files.</strong> Both <code class="language-plaintext highlighter-rouge">.claude/review-loop.local.md</code> and <code class="language-plaintext highlighter-rouge">.task/</code> directories contain transient state that should not enter version control.</li>
  <li><strong>Pin your reviewer model.</strong> Use explicit model identifiers in configuration rather than aliases to avoid unexpected behaviour when model defaults change.</li>
</ol>

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>SmartScope, “Automating the Claude Code × Codex Review Loop — Three Levels,” March 2026. <a href="https://smartscope.blog/en/blog/claude-code-codex-review-loop-automation-2026/">https://smartscope.blog/en/blog/claude-code-codex-review-loop-automation-2026/</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:1:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:1:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:1:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:1:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a> <a href="#fnref:1:5" class="reversefootnote" role="doc-backlink">&#8617;<sup>6</sup></a> <a href="#fnref:1:6" class="reversefootnote" role="doc-backlink">&#8617;<sup>7</sup></a> <a href="#fnref:1:7" class="reversefootnote" role="doc-backlink">&#8617;<sup>8</sup></a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>MindStudio, “What Is the OpenAI Codex Plugin for Claude Code? How Cross-Provider AI Review Works,” 2026. <a href="https://www.mindstudio.ai/blog/openai-codex-plugin-claude-code-cross-provider-review">https://www.mindstudio.ai/blog/openai-codex-plugin-claude-code-cross-provider-review</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>OpenAI, “Agent approvals &amp; security – Codex,” 2026. <a href="https://developers.openai.com/codex/agent-approvals-security">https://developers.openai.com/codex/agent-approvals-security</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>OpenAI, “Introducing Codex Plugin for Claude Code,” OpenAI Developer Community, March 2026. <a href="https://community.openai.com/t/introducing-codex-plugin-for-claude-code/1378186">https://community.openai.com/t/introducing-codex-plugin-for-claude-code/1378186</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>OpenAI, “Hooks – Codex,” OpenAI Developers, 2026. <a href="https://developers.openai.com/codex/hooks">https://developers.openai.com/codex/hooks</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:5:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>OpenAI, “codex-plugin-cc,” GitHub, March 2026. <a href="https://github.com/openai/codex-plugin-cc">https://github.com/openai/codex-plugin-cc</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>Hamel Husain, “claude-review-loop,” GitHub, 2026. <a href="https://github.com/hamelsmu/claude-review-loop">https://github.com/hamelsmu/claude-review-loop</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:7:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>Z-M-Huang, “claude-codex: Multi-AI orchestration plugin,” GitHub, 2026. <a href="https://github.com/Z-M-Huang/claude-codex">https://github.com/Z-M-Huang/claude-codex</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:8:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:8:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Automating the Cross-Model Review Loop: Three Levels from SKILL.md to Multi-AI Pipeline]]></summary></entry><entry><title type="html">Learning Plan for Becoming a Codex CLI Expert</title><link href="https://codex.danielvaughan.com/2026/04/07/learning-plan-becoming-codex-cli-expert/" rel="alternate" type="text/html" title="Learning Plan for Becoming a Codex CLI Expert" /><published>2026-04-07T00:00:00+01:00</published><updated>2026-04-07T00:00:00+01:00</updated><id>https://codex.danielvaughan.com/2026/04/07/learning-plan-becoming-codex-cli-expert</id><content type="html" xml:base="https://codex.danielvaughan.com/2026/04/07/learning-plan-becoming-codex-cli-expert/"><![CDATA[<h1 id="learning-plan-for-becoming-a-codex-cli-expert">Learning Plan for Becoming a Codex CLI Expert</h1>

<hr />

<p>Codex CLI has grown from a prototype terminal assistant into a full agentic coding platform — sub-agents, skills, MCP integrations, worktrees, cloud tasks, and an enterprise governance model<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. The surface area is large enough that a structured learning plan pays for itself quickly. This guide maps a four-phase path from first install to production-grade orchestration, with concrete exercises and milestones at each level.</p>

<h2 id="phase-1--foundations-week-12">Phase 1 — Foundations (Week 1–2)</h2>

<p>The goal is a working installation, confident navigation of the TUI, and an intuitive feel for the approval model.</p>

<h3 id="11-installation-and-authentication">1.1 Installation and Authentication</h3>

<p>Install via npm (or the Windows installer if you are on Windows, which reached full feature parity in March 2026<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>npm <span class="nb">install</span> <span class="nt">-g</span> @openai/codex
codex login          <span class="c"># OAuth or API key</span>
codex <span class="nt">--version</span>      <span class="c"># confirm 0.118.x or later</span>
</code></pre></div></div>

<p>Verify your default model. As of April 2026 the recommended default is <code class="language-plaintext highlighter-rouge">gpt-5.4</code>, which combines the coding strength of <code class="language-plaintext highlighter-rouge">gpt-5.3-codex</code> with stronger reasoning and native computer use<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h3 id="12-the-approval-model">1.2 The Approval Model</h3>

<p>Codex CLI’s security posture rests on three approval modes<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th>Mode</th>
      <th>File edits</th>
      <th>Shell commands</th>
      <th>Network</th>
      <th>Best for</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">suggest</code> (default)</td>
      <td>Approval required</td>
      <td>Approval required</td>
      <td>Blocked</td>
      <td>Learning, auditing</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">auto-edit</code></td>
      <td>Auto-applied</td>
      <td>Approval required</td>
      <td>Blocked</td>
      <td>Day-to-day development</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">full-auto</code></td>
      <td>Auto-applied</td>
      <td>Auto-executed</td>
      <td>Available</td>
      <td>CI/CD, automation</td>
    </tr>
  </tbody>
</table>

<p>Switch at launch or mid-session:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex <span class="nt">--approval-mode</span> auto-edit
<span class="c"># or inside the TUI:</span>
/permissions
</code></pre></div></div>

<p>The sandbox layer underneath (<code class="language-plaintext highlighter-rouge">read-only</code>, <code class="language-plaintext highlighter-rouge">workspace-write</code>, <code class="language-plaintext highlighter-rouge">danger-full-access</code>) is orthogonal to approval mode<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>. Understanding both dimensions is the first genuine milestone.</p>

<h3 id="13-first-exercises">1.3 First Exercises</h3>

<ol>
  <li><strong>Explain a file</strong> — open a repository you know well, run <code class="language-plaintext highlighter-rouge">codex</code> in <code class="language-plaintext highlighter-rouge">suggest</code> mode, and ask it to explain a complex module. Observe how it reads files.</li>
  <li><strong>Fix a bug</strong> — switch to <code class="language-plaintext highlighter-rouge">auto-edit</code>, paste a stack trace, and let Codex propose a patch. Review the diff before accepting.</li>
  <li><strong>Run tests</strong> — use <code class="language-plaintext highlighter-rouge">/permissions</code> to switch to <code class="language-plaintext highlighter-rouge">full-auto</code> inside the session and ask Codex to run the test suite and fix any failures.</li>
</ol>

<pre><code class="language-mermaid">flowchart LR
    A[suggest] --&gt;|"/permissions"| B[auto-edit]
    B --&gt;|"/permissions"| C[full-auto]
    C --&gt;|"/permissions"| A
    style A fill:#e8f5e9
    style B fill:#fff3e0
    style C fill:#ffebee
</code></pre>

<p><strong>Milestone:</strong> You can install Codex, authenticate, switch between approval modes, and explain the sandbox/approval matrix to a colleague.</p>

<hr />

<h2 id="phase-2--configuration-and-context-week-34">Phase 2 — Configuration and Context (Week 3–4)</h2>

<p>The goal is to make Codex consistently useful by giving it durable project knowledge and personalised defaults.</p>

<h3 id="21-configtoml">2.1 config.toml</h3>

<p>Codex reads <code class="language-plaintext highlighter-rouge">~/.codex/config.toml</code> for persistent settings<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. A sensible starter:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">model</span> <span class="p">=</span> <span class="s">"gpt-5.4"</span>
<span class="py">approval_mode</span> <span class="p">=</span> <span class="s">"auto-edit"</span>

<span class="nn">[history]</span>
<span class="py">persistence</span> <span class="p">=</span> <span class="s">"across-sessions"</span>

<span class="nn">[project_doc]</span>
<span class="py">max_bytes</span> <span class="p">=</span> <span class="mi">65536</span>
<span class="py">fallback_filenames</span> <span class="p">=</span> <span class="p">[</span><span class="s">"TEAM_GUIDE.md"</span><span class="p">,</span> <span class="s">".agents.md"</span><span class="p">]</span>
</code></pre></div></div>

<p>Profiles let you maintain separate configurations per client or project:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex <span class="nt">--profile</span> enterprise-client
</code></pre></div></div>

<h3 id="22-agentsmd--your-constitution">2.2 AGENTS.md — Your Constitution</h3>

<p>AGENTS.md is Codex’s instruction discovery system<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. It follows a three-tier hierarchy:</p>

<ol>
  <li><strong>Global</strong> — <code class="language-plaintext highlighter-rouge">~/.codex/AGENTS.md</code> (or <code class="language-plaintext highlighter-rouge">AGENTS.override.md</code> for highest priority)</li>
  <li><strong>Repository root</strong> — checked into version control with the team</li>
  <li><strong>Subdirectory</strong> — progressively more specific guidance, concatenated from root downward</li>
</ol>

<p>Files are merged until <code class="language-plaintext highlighter-rouge">project_doc_max_bytes</code> (32 KiB by default) is reached<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>. A minimal project-level example:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># AGENTS.md</span>

<span class="gu">## Language &amp; Style</span>
<span class="p">-</span> TypeScript with strict mode; no <span class="sb">`any`</span> types
<span class="p">-</span> Prefer <span class="sb">`pnpm`</span> over <span class="sb">`npm`</span>
<span class="p">-</span> British English in comments and documentation

<span class="gu">## Testing</span>
<span class="p">-</span> Every public function needs a unit test
<span class="p">-</span> Use Vitest, not Jest

<span class="gu">## Restrictions</span>
<span class="p">-</span> Never modify <span class="sb">`package-lock.json`</span> directly
<span class="p">-</span> Do not install new dependencies without asking
</code></pre></div></div>

<p>Verify what loaded:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex <span class="nt">--ask-for-approval</span> never <span class="s2">"Summarise the current instructions."</span>
</code></pre></div></div>

<h3 id="23-exercise-build-your-agentsmd-stack">2.3 Exercise: Build Your AGENTS.md Stack</h3>

<ol>
  <li>Create a global <code class="language-plaintext highlighter-rouge">~/.codex/AGENTS.md</code> with your personal coding preferences.</li>
  <li>Add a repository-level <code class="language-plaintext highlighter-rouge">AGENTS.md</code> with project conventions.</li>
  <li>Add a subdirectory <code class="language-plaintext highlighter-rouge">AGENTS.override.md</code> in a module that has stricter rules (e.g. no external network calls in a security module).</li>
  <li>Run the verification command and confirm all three layers appear.</li>
</ol>

<p><strong>Milestone:</strong> You have a <code class="language-plaintext highlighter-rouge">config.toml</code> with sensible defaults, a layered AGENTS.md stack, and can explain the merge order.</p>

<hr />

<h2 id="phase-3--intermediate-patterns-week-58">Phase 3 — Intermediate Patterns (Week 5–8)</h2>

<h3 id="31-mcp-integration">3.1 MCP Integration</h3>

<p>Model Context Protocol connects Codex to external tools and data sources<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. Two transport types are supported:</p>

<p><strong>STDIO</strong> — local processes, configured via CLI or config.toml:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex mcp add context7 <span class="nt">--</span> npx <span class="nt">-y</span> @upstash/context7-mcp
</code></pre></div></div>

<p><strong>Streaming HTTP</strong> — remote servers with bearer token authentication:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[mcp_servers.docs-server]</span>
<span class="py">url</span> <span class="p">=</span> <span class="s">"https://docs.internal.co/mcp"</span>
<span class="py">bearer_token_env_var</span> <span class="p">=</span> <span class="s">"DOCS_MCP_TOKEN"</span>
<span class="py">tool_timeout_sec</span> <span class="p">=</span> <span class="mi">30</span>
</code></pre></div></div>

<p>Use <code class="language-plaintext highlighter-rouge">/mcp</code> in the TUI to inspect active servers. Use <code class="language-plaintext highlighter-rouge">enabled_tools</code> and <code class="language-plaintext highlighter-rouge">disabled_tools</code> to control which tools from a server are exposed<sup id="fnref:8:1" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</p>

<p>For OAuth-enabled servers:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codex mcp login docs-server
</code></pre></div></div>

<h3 id="32-skills">3.2 Skills</h3>

<p>A skill packages instructions, resources, and optional scripts so Codex can follow a workflow reliably<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>. The minimum structure:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.agents/skills/lint-fix/
├── SKILL.md
└── agents/
    └── openai.yaml   # optional: UI metadata, tool deps
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">SKILL.md</code> front matter:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">lint-fix</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Fix all ESLint errors in staged files</span>
<span class="nn">---</span>

<span class="p">1.</span> Run <span class="sb">`npx eslint --fix $(git diff --cached --name-only)`</span>
<span class="p">2.</span> Stage the fixed files
<span class="p">3.</span> Report remaining unfixable errors
</code></pre></div></div>

<p>Skills are discovered from four scopes: repository (<code class="language-plaintext highlighter-rouge">.agents/skills/</code>), user (<code class="language-plaintext highlighter-rouge">$HOME/.agents/skills</code>), admin (<code class="language-plaintext highlighter-rouge">/etc/codex/skills</code>), and built-in<sup id="fnref:9:1" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>. Use <code class="language-plaintext highlighter-rouge">$skill-creator</code> to scaffold new skills interactively.</p>

<p>Invoke explicitly with <code class="language-plaintext highlighter-rouge">/skills</code> or <code class="language-plaintext highlighter-rouge">$skill-name</code>, or let Codex match implicitly based on task description.</p>

<h3 id="33-model-selection-strategy">3.3 Model Selection Strategy</h3>

<p>Not every task needs <code class="language-plaintext highlighter-rouge">gpt-5.4</code>. A practical model allocation<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>:</p>

<pre><code class="language-mermaid">flowchart TD
    T[Task arrives] --&gt; Q{Complexity?}
    Q --&gt;|High: architecture, refactoring| A["gpt-5.4"]
    Q --&gt;|Medium: feature implementation| B["gpt-5.3-codex"]
    Q --&gt;|Low: search, formatting, docs| C["gpt-5.4-mini"]
    A --&gt; R[Review output]
    B --&gt; R
    C --&gt; R
</code></pre>

<p>Switch mid-session with <code class="language-plaintext highlighter-rouge">/model</code> — no restart needed<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h3 id="34-exercises">3.4 Exercises</h3>

<ol>
  <li><strong>MCP</strong> — connect a documentation MCP server and ask Codex to answer questions using it.</li>
  <li><strong>Skills</strong> — create a skill that runs your team’s code review checklist and packages results into a PR comment.</li>
  <li><strong>Model switching</strong> — use <code class="language-plaintext highlighter-rouge">gpt-5.4-mini</code> for a codebase search task, then switch to <code class="language-plaintext highlighter-rouge">gpt-5.4</code> for a refactoring task, and compare cost and quality.</li>
</ol>

<p><strong>Milestone:</strong> You have at least one MCP server connected, one custom skill, and a model selection heuristic you can articulate.</p>

<hr />

<h2 id="phase-4--advanced-orchestration-week-912">Phase 4 — Advanced Orchestration (Week 9–12)</h2>

<h3 id="41-sub-agents-and-worktrees">4.1 Sub-Agents and Worktrees</h3>

<p>Sub-agents let you parallelise larger tasks<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. Since version 0.117.0, sub-agents use readable path-based addresses like <code class="language-plaintext highlighter-rouge">/root/agent_a</code> with structured messaging<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>.</p>

<p>Worktrees isolate each agent in its own Git branch, so multiple agents can modify the same repository without conflicts<sup id="fnref:10:1" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>. The desktop app handles worktree lifecycle automatically; from the CLI you manage it via <code class="language-plaintext highlighter-rouge">/agent</code> commands.</p>

<p>A practical pattern: use <code class="language-plaintext highlighter-rouge">gpt-5.4</code> as a planning coordinator that delegates narrower subtasks (file review, test writing, documentation) to <code class="language-plaintext highlighter-rouge">gpt-5.4-mini</code> sub-agents running in parallel worktrees.</p>

<h3 id="42-cicd-integration">4.2 CI/CD Integration</h3>

<p><code class="language-plaintext highlighter-rouge">codex exec</code> is the non-interactive mode designed for pipelines<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># In a GitHub Actions workflow</span>
codex <span class="nb">exec</span> <span class="nt">--full-auto</span> <span class="nt">--model</span> gpt-5.4-mini <span class="se">\</span>
  <span class="s2">"Review this PR diff and post a summary comment"</span> <span class="se">\</span>
  &lt; &lt;<span class="o">(</span>gh <span class="nb">pr </span>diff <span class="nv">$PR_NUMBER</span><span class="o">)</span>
</code></pre></div></div>

<p>As of 0.118.0, <code class="language-plaintext highlighter-rouge">codex exec</code> supports prompt-plus-stdin workflows, so you can pipe input and still pass a separate prompt<sup id="fnref:11:1" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>.</p>

<p>For scheduled maintenance:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># .github/workflows/codex-sweep.yml</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">Weekly dependency sweep</span>
<span class="na">on</span><span class="pi">:</span>
  <span class="na">schedule</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">cron</span><span class="pi">:</span> <span class="s1">'</span><span class="s">0</span><span class="nv"> </span><span class="s">9</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">*</span><span class="nv"> </span><span class="s">1'</span>
<span class="na">jobs</span><span class="pi">:</span>
  <span class="na">sweep</span><span class="pi">:</span>
    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
    <span class="na">steps</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v4</span>
      <span class="pi">-</span> <span class="na">run</span><span class="pi">:</span> <span class="s">npm i -g @openai/codex</span>
      <span class="pi">-</span> <span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
          <span class="s">codex exec --full-auto \</span>
            <span class="s">"Update outdated dependencies, run tests, \</span>
             <span class="s">and open a PR if everything passes"</span>
</code></pre></div></div>

<h3 id="43-enterprise-governance">4.3 Enterprise Governance</h3>

<p>For teams, governance comes through version-controlled configuration<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">13</a></sup>:</p>

<ul>
  <li><strong>AGENTS.md in source control</strong> — policy changes go through PR review, providing an audit trail</li>
  <li><strong>Profiles</strong> — <code class="language-plaintext highlighter-rouge">codex --profile production</code> loads a locked-down config with <code class="language-plaintext highlighter-rouge">suggest</code> mode and <code class="language-plaintext highlighter-rouge">read-only</code> sandbox</li>
  <li><strong>Plugins</strong> — since 0.117.0, plugins are first-class with product-scoped syncing at startup<sup id="fnref:11:2" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>, enabling centralised distribution of approved skills and MCP servers</li>
</ul>

<pre><code class="language-mermaid">flowchart TD
    subgraph Governance
        A[AGENTS.md in repo] --&gt;|PR review| B[Approved policies]
        C[config.toml profiles] --&gt; D[Environment-specific settings]
        E[Plugin registry] --&gt;|Startup sync| F[Approved skills + MCP]
    end
    B --&gt; G[Developer workstation]
    D --&gt; G
    F --&gt; G
    G --&gt; H[Codex CLI session]
</code></pre>

<h3 id="44-exercises">4.4 Exercises</h3>

<ol>
  <li><strong>Sub-agents</strong> — set up a planning agent that delegates test writing to three sub-agents working in parallel worktrees.</li>
  <li><strong>CI/CD</strong> — add a GitHub Actions workflow that uses <code class="language-plaintext highlighter-rouge">codex exec</code> to auto-review PRs.</li>
  <li><strong>Enterprise config</strong> — create two profiles (<code class="language-plaintext highlighter-rouge">dev</code> and <code class="language-plaintext highlighter-rouge">production</code>) with different approval modes and model selections.</li>
</ol>

<p><strong>Milestone:</strong> You can orchestrate multi-agent workflows, integrate Codex into CI/CD pipelines, and explain your governance model.</p>

<hr />

<h2 id="mastery-checklist">Mastery Checklist</h2>

<p>Use this as a self-assessment. Tick each item when you can demonstrate it confidently:</p>

<table>
  <thead>
    <tr>
      <th>Level</th>
      <th>Skill</th>
      <th>✓</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Foundation</td>
      <td>Install, authenticate, explain approval × sandbox matrix</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Foundation</td>
      <td>Navigate the TUI, use <code class="language-plaintext highlighter-rouge">/permissions</code>, attach images</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Configuration</td>
      <td>Maintain a layered AGENTS.md stack</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Configuration</td>
      <td>Customise <code class="language-plaintext highlighter-rouge">config.toml</code> with profiles</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Intermediate</td>
      <td>Connect and manage MCP servers</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Intermediate</td>
      <td>Create and distribute custom skills</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Intermediate</td>
      <td>Select models by task complexity</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Advanced</td>
      <td>Orchestrate sub-agents in parallel worktrees</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Advanced</td>
      <td>Integrate <code class="language-plaintext highlighter-rouge">codex exec</code> into CI/CD pipelines</td>
      <td>☐</td>
    </tr>
    <tr>
      <td>Advanced</td>
      <td>Implement enterprise governance with profiles and plugins</td>
      <td>☐</td>
    </tr>
  </tbody>
</table>

<h2 id="recommended-reading-order">Recommended Reading Order</h2>

<p>If you are working through Daniel’s Codex CLI knowledge base, this learning plan maps to the following article sequence:</p>

<ol>
  <li>Installation and first steps → <em>Getting Started</em> articles</li>
  <li>AGENTS.md deep dive → <em>Codified Context: Three-Tier Knowledge Architecture</em></li>
  <li>MCP integration → <em>MCP configuration</em> articles</li>
  <li>Skills → <em>Agent Skills</em> articles</li>
  <li>Competitive context → <em>Codex CLI Competitive Position April 2026</em></li>
  <li>Advanced internals → <em>How the Codex CLI Agentic Loop Works</em></li>
</ol>

<hr />

<h2 id="citations">Citations</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/cli">Codex CLI official documentation — OpenAI Developers</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/changelog">Codex CLI changelog — Windows launch March 2026</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/cli/features">Codex CLI features — model selection and gpt-5.4</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/cli/reference">Codex CLI command reference — approval modes</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://inventivehq.com/knowledge-base/openai/how-to-configure-sandbox-modes">How to Configure Approval and Sandbox Modes — Inventive HQ</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/config-reference">Codex configuration reference</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/guides/agents-md">Custom instructions with AGENTS.md — OpenAI Developers</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:7:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <table>
        <tbody>
          <tr>
            <td>[Model Context Protocol — Codex</td>
            <td>OpenAI Developers](https://developers.openai.com/codex/mcp)</td>
          </tr>
        </tbody>
      </table>
      <p><a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:8:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <table>
        <tbody>
          <tr>
            <td>[Agent Skills — Codex</td>
            <td>OpenAI Developers](https://developers.openai.com/codex/skills)</td>
          </tr>
        </tbody>
      </table>
      <p><a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:9:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://kingy.ai/ai/the-codex-app-super-guide-2026-from-hello-world-to-worktrees-skills-mcp-ci-and-enterprise-governance/">The Codex App Super Guide 2026 — Kingy AI</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:10:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p><a href="https://developers.openai.com/codex/changelog">Codex CLI changelog — v0.117.0 and v0.118.0</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:11:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:11:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <table>
        <tbody>
          <tr>
            <td>[Best practices — Codex</td>
            <td>OpenAI Developers](https://developers.openai.com/codex/learn/best-practices)</td>
          </tr>
        </tbody>
      </table>
      <p><a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13" role="doc-endnote">
      <p><a href="https://www.bighatgroup.com/blog/agentic-coding-harnesses-claude-code-codex-gemini-enterprise-guide/">Agentic Coding Harnesses: Enterprise Guide — Big Hat Group</a> <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Daniel Vaughan</name></author><summary type="html"><![CDATA[Learning Plan for Becoming a Codex CLI Expert]]></summary></entry></feed>