The Codex Python SDK: Embedding Agents in Scripts, Pipelines, and Custom Tooling

The Codex Python SDK: Embedding Agents in Scripts, Pipelines, and Custom Tooling


The interactive TUI is how most developers first encounter Codex CLI. Type a prompt, watch the agent work, approve or reject tool calls. But the real leverage — the kind that compounds across teams and pipelines — comes when you drop the terminal and drive the agent programmatically. Since v0.131.0, the official Python SDK (openai-codex on PyPI, imported as openai_codex) provides exactly that surface 1.

This article covers the SDK’s architecture, core API, authentication flows, approval modes, and practical patterns for embedding Codex agents in Python scripts, CI pipelines, and custom tooling.

Architecture: JSON-RPC over stdio

The Python SDK is a thin, type-safe client that spawns the Codex app-server binary as a subprocess and communicates via JSON-RPC v2 over stdio 2. This is the same protocol the TUI uses internally — the SDK simply exposes it as a Python API.

sequenceDiagram
    participant Script as Python Script
    participant SDK as openai_codex SDK
    participant AppServer as Codex App Server
    participant Model as OpenAI API

    Script->>SDK: Codex()
    SDK->>AppServer: spawn subprocess
    SDK->>AppServer: initialize (JSON-RPC)
    AppServer-->>SDK: capabilities
    Script->>SDK: thread_start(model="gpt-5.4")
    SDK->>AppServer: thread/start
    AppServer-->>SDK: thread_id
    Script->>SDK: thread.run("Fix the failing tests")
    SDK->>AppServer: turn/start
    AppServer->>Model: API request
    Model-->>AppServer: completion
    AppServer-->>SDK: TurnResult
    SDK-->>Script: result.final_response

The openai-codex package depends on openai-codex-cli-bin, which bundles platform-specific binaries for macOS (ARM64/x86_64), Linux (x86_64/ARM64), and Windows (x86_64/ARM64) 2. Version alignment between the SDK and binary is strict — mismatches produce a startup error.

Installation

The SDK requires Python 3.10 or later. Install from the repository:

cd sdk/python
uv sync
source .venv/bin/activate

Alternatively, if the published PyPI package is available for your platform:

pip install openai-codex

For development against a local Codex binary, override the path:

from openai_codex import Codex, AppServerConfig

config = AppServerConfig(codex_bin="/usr/local/bin/codex")
with Codex(config=config) as codex:
    ...

Core API Surface

The SDK exposes two primary classes: Codex (synchronous) and AsyncCodex (asynchronous) 2. Both follow the same pattern: create a client, start a thread, run turns.

Synchronous Usage

from openai_codex import Codex

with Codex() as codex:
    thread = codex.thread_start(model="gpt-5.4")
    result = thread.run("Refactor the auth module to use dependency injection.")
    print(result.final_response)
    print(f"Items generated: {len(result.items)}")

The run() method submits a turn and collects all events into a TurnResult. The final_response field contains the agent’s concluding text — it is None when the turn completes without a final-answer message 3.

Asynchronous Usage

from openai_codex import AsyncCodex
import asyncio

async def main():
    async with AsyncCodex() as codex:
        thread = await codex.thread_start(model="gpt-5.4")
        result = await thread.run("Generate unit tests for the payment service.")
        print(result.final_response)

asyncio.run(main())

Thread Lifecycle Methods

Beyond thread_start(), the SDK supports resuming and forking threads 2:

Method Purpose
thread_start(model, base_instructions) Create a fresh session
thread_resume(thread_id) Reload an existing session with state overrides
thread_fork(thread_id) Branch from an existing thread’s state
thread.run(prompt) Submit a turn and collect all events
thread.turn(...) Low-level turn control with streaming and interruption

For text-only workflows, run() accepts a plain string as input. For streaming, steering, or interrupt control, use turn() instead 3.

Authentication

The SDK supports three authentication methods, added in v0.132.0 4:

API Key (Headless)

from openai_codex import Codex

with Codex() as codex:
    codex.login_api_key("sk-...")
    account = codex.account()
    print(f"Authenticated as: {account}")

ChatGPT Browser Flow

login = codex.login_chatgpt()
print(f"Open this URL: {login.auth_url}")
completed = login.wait()
print(f"Login successful: {completed.success}")

Device Code Flow

login = codex.login_chatgpt_device_code()
print(f"Go to {login.verification_url} and enter: {login.user_code}")
completed = login.wait()

For CI pipelines, the API key method is the obvious choice — set OPENAI_API_KEY in your environment and the SDK picks it up automatically without an explicit login_api_key() call.

Approval Modes and Security

Codex’s dual-layer security model — sandbox enforcement plus approval policies — is fully configurable through the SDK 5. The key parameters map to the same CLI flags:

SDK Parameter CLI Equivalent Effect
approval_policy="on-request" --ask-for-approval on-request Agent asks before mutations
approval_policy="never" --ask-for-approval never Auto-approve everything
approval_policy="untrusted" Default for non-VCS dirs Safe ops auto-approved, mutations need approval
sandbox_mode="workspace-write" --sandbox workspace-write Write access to workspace only
sandbox_mode="read-only" --sandbox read-only No file writes permitted

Auto-Review (Guardian Subagent)

For automated pipelines where you want safety without human-in-the-loop, configure approvals_reviewer="auto_review" to route approval requests through a guardian subagent 5. This secondary agent evaluates each request against a risk framework — checking for data exfiltration, credential probing, and destructive actions — before approving or denying. Low-risk actions proceed automatically; critical-risk actions are denied.

from openai_codex import Codex

with Codex() as codex:
    thread = codex.thread_start(
        model="gpt-5.4",
        approval_policy="on-request",
        approvals_reviewer="auto_review",
        sandbox_mode="workspace-write",
    )
    result = thread.run("Upgrade all dependencies and fix breaking changes.")

⚠️ The guardian subagent incurs additional model calls and associated costs. For high-throughput batch operations, approval_policy="never" with a strict read-only sandbox may be more cost-effective.

Practical Patterns

Pattern 1: CI Fix-on-Failure

When a CI job fails, trigger a Codex agent to diagnose and propose a fix:

#!/usr/bin/env python3
"""ci_fix.py — Triggered by GitHub Actions on test failure."""

import os
import subprocess
from openai_codex import Codex

failing_test = os.environ["FAILING_TEST"]
commit_sha = os.environ["GITHUB_SHA"]

with Codex() as codex:
    thread = codex.thread_start(
        model="gpt-5.4-mini",
        sandbox_mode="workspace-write",
        approval_policy="never",
    )
    result = thread.run(
        f"The test `{failing_test}` is failing at commit {commit_sha}. "
        "Diagnose the root cause, apply the minimal fix, and verify the test passes."
    )

    if result.final_response:
        # Create a PR with the fix
        subprocess.run(["gh", "pr", "create",
            "--title", f"fix: auto-repair {failing_test}",
            "--body", result.final_response], check=True)

Pattern 2: Batch Code Review with thread_fork

Review multiple PRs by forking from a base thread that already has the project context:

from openai_codex import Codex

with Codex() as codex:
    # Base thread with project conventions
    base = codex.thread_start(
        model="gpt-5.4",
        base_instructions="You are a code reviewer. Follow CONTRIBUTING.md rules.",
    )
    base.run("Read CONTRIBUTING.md and the project's linting configuration.")

    # Fork per PR for isolated reviews
    for pr_number in [142, 143, 147]:
        review_thread = codex.thread_fork(base.thread_id)
        result = review_thread.run(
            f"Review the changes in PR #{pr_number}. "
            "Flag security issues, performance regressions, and style violations."
        )
        print(f"PR #{pr_number}: {result.final_response}")

Pattern 3: Structured Output for Toolchain Integration

Combine the SDK with --output-schema for machine-readable results that feed into downstream tools:

import json
from openai_codex import Codex

with Codex() as codex:
    thread = codex.thread_start(model="gpt-5.4-mini")
    result = thread.run(
        "Analyse the codebase for security vulnerabilities. "
        "Return JSON with fields: file, line, severity, description, fix."
    )

    findings = json.loads(result.final_response)
    critical = [f for f in findings if f["severity"] == "critical"]

    if critical:
        raise SystemExit(f"{len(critical)} critical vulnerabilities found")

Pattern 4: Agents SDK Integration

For multi-agent orchestration, run Codex as an MCP server and connect it to the OpenAI Agents SDK 6:

from agents import Agent
from agents.mcp import MCPServerStdio

async with MCPServerStdio(
    name="Codex CLI",
    params={
        "command": "npx",
        "args": ["-y", "codex", "mcp-server"],
    },
    client_session_timeout_seconds=360000,
) as codex_mcp:
    developer = Agent(
        name="Backend Developer",
        instructions="Implement API endpoints. Use the codex tool for file operations.",
        mcp_servers=[codex_mcp],
    )

When running as an MCP server, Codex exposes two tools: codex (start a new session) and codex-reply (continue an existing session via threadId) 6. This enables multi-agent workflows where a project manager agent delegates tasks to specialised developer agents, each backed by a Codex session.

Type Safety and Wire Protocol

The SDK uses Pydantic models generated from the Rust app-server’s protocol definitions 2. Fields use snake_case in Python but serialise to camelCase on the wire:

from openai_codex.types import TurnResult

# TurnResult fields:
# - final_response: Optional[str]
# - items: List[Item]
# - timing: TimingInfo
# - usage: UsageData

All types are exported from openai_codex.types, giving full IDE autocompletion and static analysis support. The strict version pinning between SDK and binary ensures the generated types always match the running app-server’s protocol.

Model Selection

The same model selection rules apply as in interactive mode 7. For SDK workloads:

Use Case Recommended Model Rationale
Complex refactoring gpt-5.4 Stronger reasoning for multi-file changes
Test generation, linting gpt-5.4-mini Faster, cheaper for formulaic tasks
Architecture analysis gpt-5.5 Extended context for large codebases
Batch operations gpt-5.4-mini Cost control at scale

Limitations

  • Subprocess overhead: Each Codex() instance spawns a new app-server process. For high-frequency, low-latency calls, batch multiple turns within a single thread rather than creating new instances.
  • Platform binaries: The openai-codex-cli-bin dependency ships platform-specific wheels. Alpine Linux and musl-based containers are not yet supported ⚠️.
  • No Windows sandbox: On Windows, the workspace-write sandbox relies on native Windows sandboxing, which has known gaps compared to macOS Seatbelt and Linux seccomp 5.
  • Version lock: SDK and binary versions must match exactly. Upgrading one without the other produces a startup error.
  • Experimental status: The SDK is still marked experimental. API surface changes between minor versions are possible 2.

Conclusion

The Python SDK transforms Codex from an interactive assistant into an embeddable agent runtime. Whether you are wiring it into CI pipelines, building custom review tooling, or orchestrating multi-agent workflows through the Agents SDK, the API surface is deliberately minimal: create a client, start a thread, run turns. The hard part — sandboxing, approval routing, model selection, protocol framing — is handled by the same battle-tested app-server that powers the TUI.

The combination of thread_fork for isolated parallel work, auto_review for unsupervised safety, and MCP server mode for multi-agent composition means the SDK is not just a scripting convenience — it is the foundation for production agent infrastructure.


Citations

  1. OpenAI Codex Changelog — v0.131.0 release notes, Python SDK migration to openai-codex / openai_codex. https://developers.openai.com/codex/changelog 

  2. OpenAI Codex SDK documentation — Python SDK architecture, installation, API surface, and type system. https://developers.openai.com/codex/sdk  2 3 4 5 6

  3. OpenAI Codex Python SDK README — TurnResult, thread.run(), and thread.turn() API details. https://github.com/openai/codex/tree/main/sdk/python  2

  4. OpenAI Codex Changelog — v0.132.0 release notes, Python SDK authentication flows and simplified turn APIs. https://developers.openai.com/codex/changelog 

  5. OpenAI Codex Agent Approvals & Security — sandbox modes, approval policies, guardian subagent, and network controls. https://developers.openai.com/codex/agent-approvals-security  2 3

  6. OpenAI Codex Guides — Using Codex with the Agents SDK, MCP server mode, multi-agent orchestration patterns. https://developers.openai.com/codex/guides/agents-sdk  2

  7. OpenAI Codex CLI Features — model selection, non-interactive mode, and configuration reference. https://developers.openai.com/codex/cli/features