Building a Codex Agent Swarm: From 6 Threads to 30 with External Orchestration

Codex CLI’s built-in subagent system is impressive — up to six concurrent threads with TOML-defined roles, path addressing, and CSV batch processing ¹. But six threads is a ceiling, not a floor. When your migration spans 200 microservices or your test suite needs parallel refactoring across 30 modules, you need to graduate from the built-in orchestrator to something external.

This article covers practical patterns for scaling Codex CLI beyond max_threads=6, from shell-based orchestration through to purpose-built swarm managers, with real cost and isolation strategies for running 20–30 agents in parallel.

Why Six Threads Is Not Enough

Codex CLI’s agents.max_threads defaults to 6, with agents.max_depth capped at 1 ¹. These defaults exist for good reason: each subagent maintains its own context window, so token costs scale linearly — a 6-agent run costs roughly 6× a single-agent run ². Raising max_depth risks exponential token growth and latency ¹.

But consider these real-world scenarios:

Large-scale migration: Converting 150 REST endpoints from Express to Fastify across 40 service repositories
Bulk test generation: Adding integration tests to 80 untested modules
Multi-repo dependency updates: Bumping a shared library across 25 downstream consumers

In each case, the work is embarrassingly parallel — each unit is independent, with no cross-agent coordination required. The built-in subagent system, designed for coordinated multi-step workflows within a single session, is the wrong tool.

The Graduation Decision Framework

Not every workload needs external orchestration. Use this decision tree:

flowchart TD
    A[Parallel task count?] -->|≤6| B[Use built-in subagents]
    A -->|7-15| C[Shell orchestration]
    A -->|16-30+| D[Dedicated orchestrator]
    B --> E{Cross-agent coordination needed?}
    E -->|Yes| B
    E -->|No, independent tasks| C
    C --> F{Need CI integration + auto-retry?}
    F -->|Yes| D
    F -->|No| C

The key insight: built-in subagents excel at coordinated parallel work (explorer scouts the codebase, worker implements, default reviews). External orchestration excels at independent parallel work where agents never need to communicate.

Pattern 1: Shell-Based Orchestration with codex exec

The simplest approach uses codex exec — Codex CLI’s non-interactive mode — combined with standard Unix parallelism tools ³.

Basic GNU Parallel Pattern

#!/bin/bash
# migrate-services.sh — parallel service migration

TASK_FILE="tasks.txt"  # one task description per line
MAX_JOBS=12
WORKTREE_BASE="/tmp/codex-worktrees"

migrate_service() {
    local task="$1"
    local task_id="$2"
    local worktree="$WORKTREE_BASE/agent-$task_id"

    # Create isolated worktree
    git worktree add "$worktree" -b "agent/$task_id" HEAD

    # Run Codex in non-interactive mode
    cd "$worktree" && codex exec \
        --full-auto \
        --json \
        "$task" > "/tmp/results/agent-$task_id.json" 2>&1

    local exit_code=$?
    echo "Agent $task_id completed with exit code $exit_code"
    return $exit_code
}

export -f migrate_service
export WORKTREE_BASE

# Run tasks in parallel with GNU parallel
cat "$TASK_FILE" | parallel \
    --jobs "$MAX_JOBS" \
    --linebuffer \
    --tagstring "agent-{#}" \
    migrate_service {} {#}

The --json flag on codex exec returns newline-delimited JSON events, enabling downstream processing and monitoring ³. Each agent operates in its own git worktree, providing full filesystem isolation.

Critical: Session Isolation

Prior to v0.99.0, running multiple codex exec instances in parallel caused session state interference — context from instance A would leak into instance B via shared files in ~/.codex/ ⁴. This was resolved when the exec subcommand was reimplemented on top of the app server architecture ⁴. If you are running an older version, ensure each process uses a unique CODEX_HOME directory:

CODEX_HOME="/tmp/codex-session-$task_id" codex exec --full-auto "$task"

Pattern 2: TypeScript SDK Orchestrator

For more sophisticated control — progress monitoring, failure recovery, dynamic task queuing — the Codex TypeScript SDK provides programmatic access via run() and runStreamed() ⁵.

import { Codex } from "@openai/codex-sdk";
import { execSync } from "child_process";

interface Task {
  id: string;
  prompt: string;
  worktree?: string;
}

async function runSwarm(tasks: Task[], concurrency: number = 12) {
  const codex = new Codex();
  const results: Map<string, { success: boolean; output: string }> = new Map();

  // Process tasks in batches
  for (let i = 0; i < tasks.length; i += concurrency) {
    const batch = tasks.slice(i, i + concurrency);

    const promises = batch.map(async (task) => {
      // Create worktree for isolation
      const worktree = `/tmp/swarm/${task.id}`;
      execSync(`git worktree add ${worktree} -b swarm/${task.id} HEAD`);

      try {
        const thread = codex.startThread();
        const result = await thread.run(
          `Working directory: ${worktree}\n\n${task.prompt}`
        );
        results.set(task.id, { success: true, output: result });
      } catch (err) {
        results.set(task.id, {
          success: false,
          output: (err as Error).message,
        });
      } finally {
        execSync(`git worktree remove ${worktree} --force`);
      }
    });

    await Promise.all(promises);
    console.log(`Batch ${Math.floor(i / concurrency) + 1} complete`);
  }

  return results;
}

The SDK spawns Codex CLI as a child process and communicates over stdin/stdout using JSONL ⁵. Each startThread() call creates an independent session, avoiding the shared-state problems that plagued raw codex exec in earlier versions.

Streaming Progress with runStreamed()

For real-time monitoring across your swarm, runStreamed() yields structured events ⁵:

for await (const event of thread.runStreamed(task.prompt)) {
  switch (event.type) {
    case "turn.started":
      metrics.trackAgentActive(task.id);
      break;
    case "item.completed":
      metrics.trackToolCall(task.id, event);
      break;
    case "turn.completed":
      metrics.trackAgentIdle(task.id);
      break;
  }
}

Event types include thread.started, turn.started, item.started, item.updated, item.completed, and turn.completed ⁵. Items represent individual actions (tool calls, text generation), whilst turns represent a complete agent cycle.

Pattern 3: Agent Orchestrator for Managed Swarms

ComposioHQ’s Agent Orchestrator ⁶ provides a production-grade solution for running 30+ parallel agents. Built as a 40,000-line TypeScript platform with 3,288 tests, it handles the operational concerns that shell scripts and custom SDK wrappers struggle with:

Automatic worktree isolation: Each agent gets its own git worktree, branch, and PR ⁶
CI failure recovery: When CI fails, the orchestrator injects failure logs back into the agent’s session for automatic remediation ⁶
Review routing: Reviewer comments are routed to the originating agent with full context ⁶
Multi-agent support: Pluggable architecture supporting Codex CLI, Claude Code, and Aider ⁶

Configuration is straightforward:

# agent-orchestrator.yaml
workspace:
  type: worktree    # or 'clone' for full isolation
agent:
  type: codex       # codex | claude | aider
  retry:
    max_attempts: 2
    timeout_minutes: 30

The orchestrator’s plugin architecture — Runtime, Agent, Workspace, Tracker, SCM, Notifier, and Terminal slots ⁶ — means you can swap in Codex CLI without modifying core orchestration logic.

Git Worktree Isolation at Scale

Every pattern above depends on git worktrees for filesystem isolation. This is not optional — without it, parallel agents will corrupt each other’s working trees.

graph LR
    subgraph "Shared Git Repository"
        OBJ[(Object Store)]
        REFS[(Refs/Branches)]
    end

    subgraph "Parallel Worktrees"
        W1[Agent 1<br/>worktree/agent-1<br/>branch: swarm/001]
        W2[Agent 2<br/>worktree/agent-2<br/>branch: swarm/002]
        W3[Agent N<br/>worktree/agent-n<br/>branch: swarm/N]
    end

    W1 --> OBJ
    W2 --> OBJ
    W3 --> OBJ
    W1 --> REFS
    W2 --> REFS
    W3 --> REFS

Worktree creation is near-instant because Git only checks out working files — the object store is already local ⁷. Disk cost scales with checked-out files, not repository history. For a typical 500MB repository, 30 worktrees consume roughly 15GB of disk, which is manageable on any modern development machine.

The Runtime Isolation Gap

Git worktrees isolate files but not runtimes. If your agents run npm install or pip install, they share the same global package cache and can collide on lock files ⁸. Solutions:

Container-per-agent: Wrap each codex exec in a lightweight container (podman run --rm)
Node.js local installs: Use --prefix to isolate node_modules per worktree
Python venvs: Create a fresh virtual environment in each worktree before running the agent

Monitoring with OpenTelemetry

Codex CLI ships with native OpenTelemetry support, emitting traces, metrics, and logs via OTLP ⁹. Configure it in ~/.codex/config.toml:

[telemetry]
enabled = true
otlp_endpoint = "http://localhost:4317"
otlp_protocol = "grpc"

For a swarm, this gives you per-agent visibility into token usage, API latency, tool calls, and session duration ⁹. Tools like AI Observer ¹⁰ or SigNoz ¹¹ provide purpose-built dashboards for monitoring multiple concurrent Codex sessions.

Key metrics to track across your swarm:

Metric	Why It Matters
`codex.tokens.total` per agent	Cost attribution and budget enforcement
`codex.api.latency_p99`	Detect rate limiting under high parallelism
`codex.tools.calls`	Identify agents stuck in retry loops
`codex.session.duration`	Spot hung agents for timeout enforcement

Cost Management for 20+ Parallel Agents

Running 30 agents simultaneously can burn through API budget rapidly. Each agent maintains its own context window, and with GPT-5.3-Codex at current pricing, a 30-agent swarm processing moderately complex tasks (10K tokens input + 5K output per agent) can cost $15–30 per batch ².

Mitigation strategies:

Use GPT-5.3-Codex-Spark for independent tasks: Spark delivers 1,000+ tokens/second on simpler operations ¹² at significantly lower cost, and is ideal for embarrassingly parallel work that does not require deep reasoning
Set job_max_runtime_seconds: The agents.job_max_runtime_seconds config (default 1800) prevents runaway agents ¹ — reduce this for simple tasks
Progressive batching: Start with 5 agents, verify output quality, then scale to 30 — catching prompt issues early saves 6× the cost
Dry-run validation: Use --approval-mode suggest on a single task first to verify the agent’s approach before committing to a full swarm run

Failure Recovery Patterns

At 30 agents, failures are not exceptional — they are expected. Design for them:

flowchart LR
    A[Task Queue] --> B[Spawn Agent]
    B --> C{Success?}
    C -->|Yes| D[Create PR]
    C -->|No| E{Retries left?}
    E -->|Yes| F[Log failure + retry]
    F --> B
    E -->|No| G[Flag for manual review]
    D --> H[CI Check]
    H -->|Pass| I[Ready for merge]
    H -->|Fail| J[Re-inject logs to agent]
    J --> B

The critical pattern: never discard a failed agent’s context. Log the full JSON output from codex exec --json, including tool calls and intermediate reasoning. When retrying, include the previous failure context in the new prompt so the agent does not repeat the same mistake.

When External Orchestration Is Overkill

Not every scaling problem needs a swarm. Before building orchestration infrastructure, consider:

Codex Cloud tasks: OpenAI’s hosted execution environment handles parallelism server-side, with no worktree management needed ¹³. If your tasks are self-contained and do not require local toolchain access, this is simpler.
Built-in CSV batch: The spawn_agents_on_csv tool processes many rows in parallel within the built-in 6-thread limit ¹. For 6 or fewer parallel tasks, this is the right abstraction.
Sequential with caching: If tasks share substantial context, running them sequentially on a single thread with resumeThread() ⁵ can be cheaper than parallel execution with duplicated context.

Summary

Approach	Concurrency	Complexity	Best For
Built-in subagents	≤6	Low	Coordinated multi-step workflows
Shell + GNU parallel	7–20	Medium	Batch migrations, bulk refactoring
TypeScript SDK	10–30	Medium-High	Custom pipelines with monitoring
Agent Orchestrator	20–50+	High (but managed)	Production swarms with CI integration

The progression from built-in subagents to external orchestration is not about replacing Codex’s architecture — it is about recognising when your workload has outgrown coordinated parallelism and needs independent parallelism instead. Start with codex exec and GNU parallel, graduate to the TypeScript SDK when you need programmatic control, and reach for Agent Orchestrator when you need production-grade lifecycle management.

Citations

Subagents – Codex CLI Documentation — Official documentation covering max_threads, max_depth, agent roles, TOML configuration, and spawn_agents_on_csv. ↩ ↩² ↩³ ↩⁴ ↩⁵
Codex Gets Subagents: The Parallel AI Coding Pattern Is Now The De Facto Industry Standard — Rick Hightower’s analysis of token cost scaling in multi-agent Codex workflows. ↩ ↩²
Features – Codex CLI — Official documentation for codex exec non-interactive mode and --json flag. ↩ ↩²
Multiple parallel codex exec instances interfere via shared session restore · Issue #11435 — GitHub issue documenting session state interference, closed April 2026 after app server reimplementation. ↩ ↩²
SDK – Codex CLI — Official TypeScript SDK documentation covering run(), runStreamed(), event types, and thread management. ↩ ↩² ↩³ ↩⁴ ↩⁵
ComposioHQ/agent-orchestrator — Open-source orchestrator supporting Codex CLI, Claude Code, and Aider with automatic worktree isolation and CI recovery. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Codex App Worktrees Explained: How Parallel Agents Avoid Git Conflicts — Technical explanation of worktree isolation patterns for parallel AI agents. ↩
Git Worktrees Need Runtime Isolation for Parallel AI Agent Development — Analysis of the gap between file-level and runtime-level isolation. ↩
Advanced Configuration – Codex CLI — Official documentation covering OpenTelemetry configuration in config.toml. ↩ ↩²
ai-observer: Unified local observability for AI coding assistants — Self-hosted OpenTelemetry backend designed for monitoring Codex CLI and other coding agents. ↩
OpenAI Codex Observability & Monitoring with OpenTelemetry — SigNoz integration guide for Codex CLI telemetry. ↩
OpenCode vs Codex CLI (2026) — Performance comparison noting GPT-5.3-Codex-Spark at 1,000+ tok/s. ↩
Codex App First Impressions (2026) — Overview of Codex Cloud tasks as an alternative to local parallel execution. ↩