Codex CLI Security Testing Tools: codex sandbox, codex execpolicy, and Offline Policy Validation
Codex CLI Security Testing Tools: codex sandbox, codex execpolicy, and Offline Policy Validation
Codex CLI ships two subcommands that most developers never discover: codex sandbox and codex execpolicy check. Together, they let you validate your security configuration offline — running real commands under sandbox policies and testing execution rules against sample invocations — without starting an agent session or spending a single API token. For teams rolling out Codex CLI with governance requirements, these tools close the gap between writing a policy and knowing it actually works.
The Problem: Untested Policies
Codex CLI’s security model rests on two layers: sandbox mode (filesystem and network boundaries enforced by the OS) and approval policy (when Codex must pause for human authorisation) 1. On top of these sit permission profiles (named compositions of filesystem and network rules) and execution policy rules (Starlark-based command governance) 2 3.
The challenge is that these policies are configuration — and configuration that is never tested is configuration that will fail when it matters. Before v0.125, the only way to verify a sandbox profile was to start an interactive session, prompt the agent to run a command, and observe whether it was allowed or denied. That feedback loop was slow, expensive, and non-deterministic.
The codex sandbox and codex execpolicy check subcommands solve this by providing deterministic, offline validation.
codex sandbox: Running Commands Under Policy
The codex sandbox subcommand executes an arbitrary shell command under Codex’s OS-enforced sandbox, using the same platform mechanisms that govern agent tool calls 1:
- macOS: Seatbelt sandbox profiles 4
- Linux/WSL: bubblewrap + seccomp with Landlock fallback 4
- Windows: Native restricted tokens and firewall rules, or WSL2 sandbox 5
Basic Usage
# Run a command under the default workspace-write sandbox
codex sandbox -- ls -la /etc/passwd
# Run under a named permission profile
codex sandbox --permissions-profile ci-locked -- npm test
# Set working directory explicitly
codex sandbox -C /path/to/project -- cargo build
The command after -- runs inside the sandbox exactly as it would during an agent session. If the sandbox denies an operation, the command fails — giving you immediate, deterministic feedback.
Platform-Specific Flags
On macOS, two additional flags provide richer diagnostics:
# Log all sandbox denials to stderr
codex sandbox --log-denials -- curl https://example.com
# Allow access to a specific Unix socket (e.g., Docker)
codex sandbox --allow-unix-socket /var/run/docker.sock -- docker ps
The --log-denials flag is invaluable during profile development. Rather than a cryptic “Permission denied”, you see exactly which Seatbelt operation was blocked, including the path, operation type, and the rule that triggered the denial 4.
Using Permission Profiles
The --permissions-profile flag loads a named profile from your configuration stack. Profiles are defined in config.toml under [permissions.<name>] tables 2:
[permissions.ci-locked]
default_permissions = "ci-locked"
[permissions.ci-locked.filesystem.":root"]
"." = "read"
[permissions.ci-locked.filesystem.":workspace_roots"]
"." = "write"
"**/*.env" = "deny"
[permissions.ci-locked.network]
enabled = true
[permissions.ci-locked.network.domain_rules]
"registry.npmjs.org" = "allow"
"github.com" = "allow"
"*" = "deny"
You can then validate this profile without running the agent:
# Should succeed: npm install reaches npmjs.org
codex sandbox --permissions-profile ci-locked -- npm install
# Should fail: curl to arbitrary host is denied
codex sandbox --permissions-profile ci-locked -- curl https://evil.example.com
# Should fail: reading .env files is denied
codex sandbox --permissions-profile ci-locked -- cat .env
Including Managed Configuration
Enterprise teams using requirements.toml for fleet-wide policy enforcement can test how managed configuration composes with local profiles 6:
codex sandbox --permissions-profile ci-locked --include-managed-config -- npm test
This loads the managed requirements layer on top of the named profile, mirroring the exact sandbox state an agent would see in a managed deployment.
codex execpolicy check: Validating Execution Rules
While codex sandbox tests OS-level containment, codex execpolicy check validates the Starlark-based execution policy rules that govern whether commands are allowed, prompted, or forbidden before the sandbox is even consulted 3.
How Execution Rules Work
Rules live in .rules files under rules/ directories adjacent to active config layers 3:
~/.codex/rules/default.rules # User-level rules
.codex/rules/project.rules # Project-level rules (trusted projects only)
Each rule is a prefix_rule() call in Starlark:
# Allow common git read operations
prefix_rule(
pattern=["git", ["status", "log", "diff", "show", "branch"]],
decision="allow",
justification="Read-only git operations are safe",
match=["git status", "git log --oneline"],
)
# Forbid force-push
prefix_rule(
pattern=["git", "push", "--force"],
decision="forbidden",
justification="Force-push is destructive and banned by team policy",
match=["git push --force origin main"],
)
# Require approval for package installation
prefix_rule(
pattern=["npm", "install"],
decision="prompt",
justification="Package installation should be reviewed",
match=["npm install lodash"],
)
Testing Rules Offline
The codex execpolicy check command evaluates a command against one or more rule files and emits a JSON verdict:
codex execpolicy check \
--rules ~/.codex/rules/default.rules \
--pretty \
-- git push --force origin main
Output:
{
"decision": "forbidden",
"matching_rules": [
{
"pattern": ["git", "push", "--force"],
"decision": "forbidden",
"justification": "Force-push is destructive and banned by team policy"
}
]
}
The --pretty flag formats the JSON for human readability. Without it, you get compact JSON suitable for piping into jq or other automation 3.
Stacking Multiple Rule Files
The --rules flag is repeatable, letting you test how user-level and project-level rules compose:
codex execpolicy check \
--rules ~/.codex/rules/default.rules \
--rules .codex/rules/project.rules \
--pretty \
-- rm -rf node_modules
When multiple rules match, the strictest decision wins: forbidden beats prompt, which beats allow 3.
Shell Script Handling
Execution policy evaluation handles shell scripts intelligently. For “safe” scripts — those using only plain words and safe operators (&&, ||, ;, |) — Codex splits the chain and evaluates each command independently 3:
# Each command in the chain is evaluated separately
codex execpolicy check --pretty --rules rules.rules \
-- "npm test && git add . && git commit -m 'fix tests'"
For scripts with advanced features (redirects, variable expansion, subshells), the entire invocation is treated as a single opaque command 3.
Combining Both Tools in Practice
The real power emerges when you use both subcommands together in a validation workflow:
graph TD
A[Write Permission Profile] --> B[Write Execution Rules]
B --> C{codex execpolicy check}
C -->|forbidden| D[Adjust Rules]
C -->|allow/prompt| E{codex sandbox}
E -->|denied by OS| F[Adjust Profile]
E -->|permitted| G[Policy Validated]
D --> C
F --> E
G --> H[Commit to .codex/]
CI Pipeline Integration
Both tools exit with meaningful codes, making them suitable for CI gates. A GitHub Actions workflow that validates security policy on every PR:
name: Validate Codex Security Policy
on:
pull_request:
paths:
- '.codex/rules/**'
- '.codex/config.toml'
jobs:
validate-policy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Codex CLI
run: npm i -g @openai/codex
- name: Validate execution rules
run: |
# Test that allowed commands pass
codex execpolicy check \
--rules .codex/rules/default.rules \
-- git status
codex execpolicy check \
--rules .codex/rules/default.rules \
-- npm test
# Test that forbidden commands are blocked
! codex execpolicy check \
--rules .codex/rules/default.rules \
-- rm -rf /
! codex execpolicy check \
--rules .codex/rules/default.rules \
-- git push --force
- name: Validate sandbox profile
run: |
codex sandbox --permissions-profile ci-locked \
-- echo "sandbox permits echo"
Table-Driven Validation
For comprehensive policy testing, a shell script that validates a matrix of commands against expected decisions:
#!/usr/bin/env bash
set -euo pipefail
RULES="--rules .codex/rules/default.rules"
FAIL=0
validate() {
local expected="$1"; shift
local actual
actual=$(codex execpolicy check $RULES -- "$@" 2>/dev/null \
| jq -r '.decision')
if [[ "$actual" != "$expected" ]]; then
echo "FAIL: expected=$expected actual=$actual cmd=$*"
FAIL=1
else
echo "PASS: $expected — $*"
fi
}
validate "allow" git status
validate "allow" git log --oneline
validate "allow" npm test
validate "prompt" npm install lodash
validate "forbidden" git push --force origin main
validate "forbidden" rm -rf /
exit $FAIL
The host_executable() Function
For security-critical commands, execution rules can pin to specific executable paths using host_executable() 3:
prefix_rule(
pattern=[host_executable("/usr/bin/git"), "push"],
decision="prompt",
justification="Only the system git binary is trusted for push",
)
This prevents an attacker from placing a malicious git earlier in $PATH. You can verify the resolution with codex execpolicy check:
# This uses the system git — should match the rule
codex execpolicy check --rules rules.rules --pretty -- /usr/bin/git push
# This uses a different path — may not match
codex execpolicy check --rules rules.rules --pretty -- ./malicious-git push
Limitations and Caveats
Several boundaries are worth noting:
codex sandboxrequires platform support: On Linux,bubblewrapmust be installed. On Ubuntu 24.04, AppArmor profile loading may be needed for user namespace creation 4.codex execpolicyis preview: The Starlark rules system is labelled experimental and may change 3.- Permission profiles do not govern MCP servers or browser tools: The sandbox applies only to local command execution, not to app connectors, MCP tool calls, or computer-use surfaces 2.
- Deny-read glob patterns on Linux require
glob_scan_max_depth: Unbounded**patterns must specify a scan depth to pre-expand matches before sandbox startup 2. codex sandboxdoes not test approval policy: It tests OS-level containment only. Approval flow behaviour (whether the TUI would prompt) is not exercised.
When to Use Each Tool
| Scenario | Tool |
|---|---|
Verify filesystem deny rules block .env access |
codex sandbox |
| Verify network domain allowlist permits npm registry | codex sandbox |
Confirm git push --force is forbidden by policy |
codex execpolicy check |
| Test rule precedence across user and project layers | codex execpolicy check |
| CI gate on security policy changes | Both |
| Debug macOS Seatbelt denials | codex sandbox --log-denials |
| Validate managed enterprise requirements compose correctly | codex sandbox --include-managed-config |
Conclusion
codex sandbox and codex execpolicy check transform security policy from hope-based configuration into testable infrastructure. They cost nothing to run — no API calls, no tokens, no agent sessions — and they fit naturally into CI pipelines. If you are deploying Codex CLI with any governance requirements, these two commands should be the first thing you reach for after writing your policies.