Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands

8 minute read

Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands

Codex CLI ships with a surprisingly deep set of diagnostic tools that most developers never discover. When an agent session stalls, a sandbox blocks a legitimate command, or a config key silently fails to take effect, knowing how to reach for RUST_LOG, codex sandbox, or /debug-config can save hours of guesswork. This article is a systematic reference to every built-in diagnostic surface in Codex CLI as of v0.118.0.

The Diagnostic Surface Area

Codex CLI’s diagnostic capabilities span four layers: runtime tracing via environment variables, interactive slash commands inside the TUI, standalone CLI subcommands for offline testing, and post-session analysis via JSONL rollout files.

graph TD
    A[Codex CLI Diagnostics] --> B[Runtime Tracing]
    A --> C[TUI Slash Commands]
    A --> D[Standalone Subcommands]
    A --> E[Post-Session Analysis]

    B --> B1["RUST_LOG env var"]
    B --> B2["LOG_FORMAT=json"]
    B --> B3["OpenTelemetry export"]

    C --> C1["/status"]
    C --> C2["/debug-config"]
    C --> C3["/feedback"]

    D --> D1["codex sandbox"]
    D --> D2["codex execpolicy check"]
    D --> D3["codex debug"]
    D --> D4["codex login status"]

    E --> E1["JSONL rollout files"]
    E --> E2["codex-tui.log"]

Runtime Tracing with RUST_LOG

Since Codex CLI is built in Rust atop the standard tracing crate¹, the RUST_LOG environment variable controls verbosity at module granularity. The default level for Codex crates is info².

Basic Usage

# Global debug logging
RUST_LOG=debug codex

# Trace-level logging (extremely verbose)
RUST_LOG=trace codex

# Debug logging in non-interactive mode
RUST_LOG=debug codex exec "refactor the auth module"

Module-Targeted Tracing

The real power lies in per-module targeting. Codex’s Rust workspace exposes several key tracing targets²:

# Debug the core agent loop while keeping everything else at info
RUST_LOG=info,codex_core=debug codex

# Trace shell command execution specifically
RUST_LOG=codex_exec=trace,codex_core=debug codex

# Debug sandbox behaviour
RUST_LOG=codex_sandbox=debug,codex_process_hardening=debug codex

# Trace API request/response details
RUST_LOG=codex_core::api=trace codex

# Debug MCP server connections
RUST_LOG=codex_core::mcp=debug codex

# Trace configuration resolution
RUST_LOG=codex_core::config=trace codex

# Trace authentication flows
RUST_LOG=codex_core::auth=trace codex

Structured Log Output

For machine-parseable logs — useful when piping into log aggregation — set the format to JSON²:

RUST_LOG=debug LOG_FORMAT=json codex exec "run tests" 2>&1 | tee codex-debug.log

The compact format is also available via RUST_LOG_FORMAT=compact².

Log File Locations

Codex writes TUI logs to ~/.codex/log/codex-tui.log, with automatic rotation³. In codex exec mode, timestamped log files appear at ~/.codex/logs/codex-tui-<timestamp>.log². These can be safely deleted when no longer needed, but they are invaluable for post-mortem debugging.

# Monitor logs in real time during a session
tail -f ~/.codex/logs/codex-tui-*.log

⚠️ Performance warning: Debug and trace levels can reduce throughput by 10–50%². Reserve them for active troubleshooting, not production workflows.

TUI Slash Commands for Live Diagnostics

Three slash commands provide in-session diagnostic information without leaving the TUI.

/status — Session Overview

The /status command displays the current session configuration and token usage⁴. This is your first stop when something feels off — it confirms which model is active, the current reasoning effort level, token consumption, and the effective sandbox mode.

/debug-config — Configuration Layer Diagnostics

When a config key appears to have no effect, /debug-config reveals the full configuration resolution stack⁵. It prints:

Layer order (lowest to highest precedence)
The effective value of each key and which layer set it
Policy details: allowed_approval_policies, allowed_sandbox_modes, mcp_servers, rules, enforce_residency, and experimental_network

This is particularly useful in enterprise environments where requirements.toml may silently override your config.toml settings⁵. If your sandbox_mode = "danger-full-access" is being ignored, /debug-config will show you that a managed policy is enforcing workspace-write.

/feedback — Structured Bug Reports

The /feedback command collects diagnostic information and submits it directly to OpenAI’s maintainers³. When invoked, it captures:

Request ID (essential for OpenAI support tickets)
Session ID
Connection status (connected/reconnecting/disconnected)
Last error message
Active tools count
MCP server connection status

Always run /feedback before closing a session that exhibited unexpected behaviour — the request ID is the single most useful datum when filing issues on GitHub³.

The codex sandbox Subcommand

The codex sandbox subcommand⁶ lets you test arbitrary commands under the exact same sandbox enforcement that Codex applies during agent sessions — without starting an agent session. This is indispensable when diagnosing why a build tool or test runner fails under sandboxing.

Platform-Specific Syntax

# macOS — test a command under Seatbelt enforcement
codex sandbox macos -- npm run build

# macOS — with full-auto permissions and denial logging
codex sandbox macos --full-auto --log-denials -- cargo test

# Linux — test under Landlock/bubblewrap enforcement
codex sandbox linux -- pytest tests/

# Linux — full-auto mode (workspace-write equivalent)
codex sandbox linux --full-auto -- make install

# Windows — test under restricted token enforcement
codex sandbox windows --full-auto -- dotnet test

The --log-denials flag on macOS is particularly valuable: it prints every Seatbelt denial to stderr, showing exactly which filesystem path or network operation was blocked⁶.

Legacy Aliases

The older codex debug seatbelt and codex debug landlock commands still work as aliases⁷:

# These are equivalent:
codex sandbox macos -- ls /etc
codex debug seatbelt -- ls /etc

Practical Use: Diagnosing Build Failures

A common scenario: your Rust project builds fine outside Codex but fails under the agent’s sandbox. Use codex sandbox to isolate the issue:

# Step 1: Test the build under sandbox
codex sandbox linux -- cargo build 2>&1 | grep -i denied

# Step 2: If failures appear, try with full-auto (workspace-write)
codex sandbox linux --full-auto -- cargo build

# Step 3: If it still fails, the issue is network access
# (e.g., crates.io downloads blocked by sandbox)

This workflow avoids the cost of starting a full agent session just to debug sandbox restrictions.

Platform Implementation Details

On macOS 12+, codex sandbox invokes Apple’s Seatbelt framework via /usr/bin/sandbox-exec with a runtime-generated profile controlling filesystem and network access⁶. On Linux, the sandbox uses a dual-mode pipeline: Landlock LSM by default, or bubblewrap (vendored in codex-rs/vendor/bubblewrap/) when enabled via features.use_linux_sandbox_bwrap = true⁶. The bubblewrap path provides stronger isolation through PID namespace separation (--unshare-pid), network namespace isolation (--unshare-net), and seccomp filters⁶.

flowchart LR
    subgraph macOS
        A[codex sandbox macos] --> B[sandbox-exec]
        B --> C[Seatbelt profile]
        C --> D[Command runs isolated]
    end

    subgraph Linux
        E[codex sandbox linux] --> F{bwrap enabled?}
        F -->|Yes| G[bubblewrap]
        F -->|No| H[Landlock + seccomp]
        G --> I[Namespace isolation]
        H --> I
        I --> J[Command runs isolated]
    end

The codex execpolicy check Subcommand

Before deploying Starlark .rules files, validate them offline with codex execpolicy check⁸. This subcommand evaluates one or more rule files against a proposed command and reports the decision without executing anything.

# Test a command against your rules
codex execpolicy check \
  --pretty \
  --rules ~/.codex/rules/default.rules \
  -- gh pr view 7888 --json title,body,comments

The output shows:

Effective decision: the strictest severity across all matched rules (forbidden > prompt > allow)⁸
matchedRules: every rule whose prefix matched, with the exact matchedPrefix shown⁸

You can combine multiple rule files:

codex execpolicy check \
  --pretty \
  --rules ~/.codex/rules/default.rules \
  --rules .codex/rules/project.rules \
  -- rm -rf node_modules

Unit Tests in Rules Files

The match and not_match fields in prefix_rule() function as inline unit tests⁸. Codex validates these examples when it loads your rules — if a match example does not trigger the rule, or a not_match example does, loading fails. Always populate these fields:

prefix_rule(
    pattern = "rm -rf",
    decision = "forbidden",
    match = ["rm -rf /", "rm -rf node_modules"],
    not_match = ["rm file.txt", "rmdir empty"]
)

The codex debug Subcommand

The codex debug command is the entry point for lower-level debugging utilities⁷:

# List available debug subcommands
codex debug --help

# Test the V2 app-server protocol with a single message
codex debug app-server send-message-v2 "Hello, world"

The send-message-v2 subcommand initialises the app-server, starts a thread, sends a single user message, and streams all server notifications back to the terminal⁷. This is useful for verifying that the app-server protocol is functioning correctly without starting the full TUI.

Authentication Diagnostics

When sessions fail to start with authentication errors, two commands help isolate the issue:

# Check current auth state without triggering a login flow
codex login status

# Inspect the auth token file directly
cat ~/.codex/auth.json | jq '.expires_at'

The codex login status command reports whether you are authenticated, the method used (browser OAuth, device code, or API key), and whether the token is valid⁷. A common failure pattern is a corrupted or expired auth.json file — the fix is to run codex logout followed by codex login³.

OpenTelemetry Integration

For production observability beyond ad-hoc tracing, Codex CLI supports OpenTelemetry export via the [otel] config section⁹:

[otel]
enabled = true
endpoint = "http://localhost:4317"
sampling_ratio = 1.0
service_name = "codex-cli"

This exports spans covering API calls, tool invocations, and sandbox operations to any OTLP-compatible backend (Jaeger, Grafana Tempo, SigNoz)⁹. Environment variables OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME also work².

⚠️ Note: codex exec does not yet export OTel metrics, and codex mcp-server mode has no telemetry support as of v0.118.0⁹.

Post-Session Analysis with JSONL Rollout Files

Every Codex session writes a JSONL rollout file to ~/.codex/sessions/¹⁰. These files contain RolloutItem events (SessionMeta, UserMessage, ResponseItem, EventMsg, ApprovalDecision) and are invaluable for understanding what happened during a session that went wrong.

# Find the latest session rollout
ls -t ~/.codex/sessions/*.jsonl | head -1

# Count tool calls in a session
cat ~/.codex/sessions/<session>.jsonl | \
  jq 'select(.type == "ResponseItem") | .item.type' | \
  sort | uniq -c | sort -rn

# Extract all approval decisions
cat ~/.codex/sessions/<session>.jsonl | \
  jq 'select(.type == "ApprovalDecision")'

The community codex-replay tool renders these JSONL files as browsable HTML, and the ccusage project provides daily and monthly cost reports parsed from rollout token counters¹⁰.

A Diagnostic Workflow Checklist

When something goes wrong, work through this sequence:

Check config: Run /debug-config to verify your settings are taking effect
Check auth: Run codex login status to rule out credential issues
Check sandbox: Use codex sandbox <platform> -- <command> to test commands in isolation
Check rules: Use codex execpolicy check --pretty --rules <file> -- <command> to validate execution policies
Enable tracing: Restart with RUST_LOG=debug codex and monitor ~/.codex/log/codex-tui.log
Review the rollout: Inspect the JSONL session file for the failed session
File a report: Run /feedback to capture diagnostic context before closing

This top-down approach moves from cheap (no restart required) to expensive (restart with tracing), minimising disruption to your workflow.

Citations

Twitter Facebook LinkedIn

Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands

Codex CLI Diagnostic Toolkit: Tracing, Sandbox Testing, and the Built-In Debugging Commands

The Diagnostic Surface Area

Runtime Tracing with RUST_LOG

Basic Usage

Module-Targeted Tracing

Structured Log Output

Log File Locations

TUI Slash Commands for Live Diagnostics

/status — Session Overview

/debug-config — Configuration Layer Diagnostics

/feedback — Structured Bug Reports

The codex sandbox Subcommand

Platform-Specific Syntax

Legacy Aliases

Practical Use: Diagnosing Build Failures

Platform Implementation Details

The codex execpolicy check Subcommand

Unit Tests in Rules Files

The codex debug Subcommand

Authentication Diagnostics

OpenTelemetry Integration

Post-Session Analysis with JSONL Rollout Files

A Diagnostic Workflow Checklist

Citations

You May Also Enjoy

Learning Plan for Becoming a Codex CLI Expert

Automating the Cross-Model Review Loop: Three Levels from SKILL.md to Multi-AI Pipeline

Codified Context: The Three-Tier Knowledge Architecture for AI Coding Agents

Codex CLI Model Lifecycle: Navigating Deprecations, Migrations, and the GPT-5.x Transition