Terminal Agent Showdown: Codex CLI vs Claude Code vs Gemini CLI in May 2026
Terminal Agent Showdown: Codex CLI vs Claude Code vs Gemini CLI in May 2026
The terminal agent race has intensified since the three-way contest crystallised in late 2025. OpenAI’s Codex CLI (v0.128.0, Rust-native), Anthropic’s Claude Code (v2.1.126, TypeScript), and Google’s Gemini CLI (v0.40.0, TypeScript) now occupy distinct niches — yet each keeps encroaching on the others’ territory. This article benchmarks all three as they stand in the first week of May 2026, covering architecture, performance, safety, pricing, and the workflows where each tool genuinely excels.
Architecture at a Glance
The three agents share a surface-level similarity — you type a natural-language prompt, and the agent reads code, proposes edits, and runs commands — but the underlying stacks diverge sharply.
graph TB
subgraph Codex["Codex CLI (Rust)"]
C1[Rust binary] --> C2[Kernel sandbox<br/>Seatbelt / Landlock / DACL]
C2 --> C3[Responses API<br/>WebSocket mode]
C3 --> C4[GPT-5.5 / GPT-5.4 /<br/>GPT-5.3-Codex-Spark]
end
subgraph Claude["Claude Code (TypeScript)"]
CL1[Node.js process] --> CL2[Process-level isolation]
CL2 --> CL3[Messages API]
CL3 --> CL4[Claude Opus 4.7 /<br/>Claude Sonnet 4.6]
end
subgraph Gemini["Gemini CLI (TypeScript)"]
G1[Node.js + SEA] --> G2[No sandbox<br/>Direct filesystem]
G2 --> G3[Gemini API]
G3 --> G4[Gemini 2.5 Pro /<br/>Gemini 3.1 Pro]
end
| Dimension | Codex CLI | Claude Code | Gemini CLI |
|---|---|---|---|
| Runtime | Rust binary | Node.js (TypeScript) | Node.js SEA bundle |
| Sandbox | Kernel-level (Seatbelt, Landlock, DACL)1 | Process-level isolation | None — direct filesystem access2 |
| Default model | GPT-5.53 | Claude Opus 4.74 | Gemini 2.5 Pro5 |
| Context window | 192K tokens2 | 200K tokens (1M with Opus 4.7)4 | 1M tokens standard2 |
| Transport | Responses API + WebSocket6 | Messages API | Gemini API |
| Licence | Apache 2.01 | Source-available (community licence)7 | Apache 2.05 |
Benchmark Scores: May 2026
SWE-bench Verified remains the most-cited benchmark, though OpenAI now recommends the harder SWE-bench Pro after discovering potential data contamination in Verified.8
| Benchmark | Codex CLI (GPT-5.5) | Claude Code (Opus 4.7) | Gemini CLI (3.1 Pro) |
|---|---|---|---|
| SWE-bench Verified | 88.7%8 | 87.6%8 | 80.6%8 |
| Terminal-Bench 2.0 | 77.3%9 | 72.1%9 | 68.4% ⚠️ |
| First-pass accuracy | ~85%9 | ~95%10 | ~78% ⚠️ |
Claude Code’s higher first-pass accuracy reflects Opus 4.7’s stronger multi-file reasoning — it tends to get edits right on the first attempt, whereas Codex CLI’s sandbox-and-retry loop compensates for occasional misses through faster iteration.10 Gemini CLI’s raw scores trail both, but its 1M-token context window means it handles very large files where competitors must chunk or compact.2
Sandbox and Safety
This is where the three tools diverge most dramatically.
Codex CLI drops commands into a kernel-enforced sandbox. On macOS, Seatbelt profiles restrict filesystem writes to the working directory. On Linux, Landlock LSM plus seccomp filters achieve the same. On Windows, DACL-based permission boundaries have stabilised as of v0.128.0.111 Three permission profiles — suggest (read-only), auto-edit (workspace writes), and full-auto (network access) — gate what the agent may do, and requirements.toml lets enterprises enforce ceiling policies across teams.12
Claude Code offers process-level isolation with permission tiers (ask, auto-accept, bypass) but lacks the kernel-level enforcement Codex provides.7 For security-critical work, Claude Code compensates with /security-review, a built-in slash command that audits the current codebase.13
Gemini CLI has no sandbox at all.2 The agent writes directly to the filesystem with the same permissions as the user running it. For a free tool with 1,000 daily requests, this is a deliberate trade-off — but it means running gemini in a production repository without reviewing every proposed change carries genuine risk.
flowchart LR
A[User prompt] --> B{Agent proposes command}
B -->|Codex CLI| C[Kernel sandbox<br/>executes in isolated namespace]
B -->|Claude Code| D[Process isolation<br/>permission check]
B -->|Gemini CLI| E[Direct execution<br/>user permissions]
C --> F[Result returned safely]
D --> F
E --> G[Result returned<br/>⚠️ no rollback guarantee]
Pricing and Token Economics
Cost structures have diverged further in 2026, with each vendor optimising for a different segment.14
| Plan | Codex CLI | Claude Code | Gemini CLI |
|---|---|---|---|
| Free tier | API trial credits only | None | 1,000 reqs/day (personal Google account)5 |
| Entry subscription | ChatGPT Plus ($20/mo) | Claude Pro ($20/mo) | Google One AI Premium ($22/mo) |
| Power tier | ChatGPT Pro ($200/mo) | Claude Max 20x ($200/mo) | Gemini Advanced ($60/mo) |
| API pricing (input) | $2.50/MTok (GPT-5.5)15 | $15/MTok (Opus 4.7)16 | $1.25/MTok (2.5 Pro)17 |
| API pricing (output) | $10/MTok (GPT-5.5)15 | $75/MTok (Opus 4.7)16 | $10/MTok (2.5 Pro)17 |
Codex CLI’s 4x token-efficiency claim9 changes the effective cost calculation considerably. If Codex completes a task in 25% of the tokens Claude Code requires, the per-task cost gap narrows despite Opus 4.7’s higher per-token rate being offset by its lower token consumption per task. In practice, most developers on subscription plans find the distinction academic — the monthly cap matters more than per-token rates.
Gemini CLI’s free tier remains unbeatable for exploration and learning. For teams, the lack of a sandbox and weaker benchmark scores push it towards supplementary rather than primary use.2
Feature Comparison: May 2026
MCP Support
All three now support the Model Context Protocol, ending a period where Codex CLI held a lead.175
- Codex CLI: Stdio and streamable HTTP transports,
supports_parallel_tool_callsper-server opt-in, sandbox-state metadata forwarding, plugin-bundled MCP servers.18 - Claude Code: Stdio and SSE transports, native skill discovery via MCP,
/mcpdiagnostic commands.7 - Gemini CLI: Stdio transport, MCP resource listing and reading added in v0.40.0.5
Multi-Agent Orchestration
- Codex CLI: MultiAgentV2 with configurable thread caps and wait-time controls, subagent spawning,
codex mcp-serverfor embedding in Agents SDK pipelines.319 - Claude Code: Agent Teams (launched February 2026) with shared task lists and mailbox system,
/ultrareviewcloud-based bug-hunting fleet.13 - Gemini CLI: No native multi-agent support. External orchestration required.5
Plan Mode
- Codex CLI:
/plancommand, plan-mode nudges in TUI (v0.128.0), persistentPLANS.mdfor long-horizon sessions.3 - Claude Code: Plan mode with
/plancommand, session recap for returning to paused plans.13 - Gemini CLI: Plan Mode added March 2026 — a read-only phase that prevents the agent from writing files until the plan is approved.20
Session Management
- Codex CLI:
codex resume, conversation forking, context compaction at configurable thresholds,--ephemeralfor disposable sessions.1 - Claude Code:
/resumepicker defaults to current directory,/recapfor session context recovery,claude project purgefor state cleanup.13 - Gemini CLI:
/memory inboxfor reviewing extracted skills, JSONL chat recording for audit trails.5
Where Each Tool Excels
Codex CLI: Autonomous Batch Operations
Codex CLI’s combination of kernel sandboxing and codex exec non-interactive mode makes it the strongest choice for unattended workloads: CI/CD pipelines, scheduled code reviews, and automated PR generation.19 The --output-schema flag ensures machine-readable structured output, and the --attempts flag enables best-of-N runs for reliability-critical tasks.6
Best for: CI integration, security-sensitive environments, token-conscious teams, enterprises requiring requirements.toml policy enforcement.
Claude Code: Complex Reasoning and Multi-File Refactors
Claude Opus 4.7’s 95% first-pass accuracy and 1M context window make Claude Code the tool of choice when you need to get a complex refactor right on the first attempt.10 The Agent Teams feature enables multi-agent collaboration without external orchestration, and /ultrareview provides cloud-scale code auditing.13
Best for: Large refactors, multi-file reasoning, teams willing to pay for quality, frontend-heavy projects.
Gemini CLI: Exploration, Prototyping, and Large Codebases
The free tier (1,000 requests/day) and 1M-token context window make Gemini CLI unbeatable for exploration.5 Multimodal input — pasting screenshots into the terminal for the agent to analyse — is a genuine differentiator for UI debugging and design-to-code workflows.2 The v0.40.0 bundled ripgrep enables offline codebase search, a feature neither competitor matches.5
Best for: Budget-conscious developers, massive codebases, multimodal workflows, prototyping and exploration.
The Three-Tool Stack
A pattern emerging among power users is maintaining all three agents:10
- Gemini CLI for quick questions, large-context exploration, and free-tier tasks
- Codex CLI for CI pipelines, sandboxed autonomous work, and structured output
- Claude Code for complex multi-file reasoning and deep refactors
This mirrors how developers historically kept multiple text editors — each tool has genuine strengths that the others cannot replicate within their current architecture.
What to Watch
- Codex CLI v0.129 is in alpha with improved
/mcpdiagnostics and faster reasoning controls via keyboard shortcuts (Alt+, / Alt+.).3 - Claude Code’s
/ultrareviewremains in research preview — if it stabilises, cloud-based multi-agent review could shift the competitive landscape.13 - Gemini CLI needs a sandbox story. The v0.40.0 release added MCP resource management but still offers no execution isolation.5 Until this changes, enterprise adoption will remain limited.
- SWE-bench Pro is displacing SWE-bench Verified as the reference benchmark after contamination concerns.8 May 2026 scores on Pro are significantly lower across the board, suggesting the gap between agents is narrower than Verified suggests.
Citations
-
Codex CLI Documentation — OpenAI Developers ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Gemini CLI vs Codex CLI: 1M Context Window vs Sandbox Execution — Morphllm ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
Gemini CLI Updates — Releasebot, May 2026 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10
-
Claude Code vs Codex CLI vs Gemini CLI: Benchmarked and Ranked — CodeAnt ↩ ↩2 ↩3 ↩4 ↩5
-
Claude Code vs Codex CLI vs Gemini CLI 2026 Comparison — DeployHQ ↩ ↩2 ↩3 ↩4
-
Claude Code v2.1.126 Release Notes — Anthropic, May 2026 ↩ ↩2 ↩3 ↩4 ↩5 ↩6