Context Health Monitoring in Codex CLI: Compaction Telemetry, Degradation Detection, and Long-Session Quality Patterns

Long-running Codex CLI sessions are now routine. Multi-hour debugging marathons, /goal workflows spanning entire feature branches, and agentic refactoring runs that touch hundreds of files all push context windows to their limits. The problem is not that compaction exists — it is that you cannot see it happening, cannot measure its impact, and cannot tell when your session has silently degraded past the point of usefulness.

This article covers the current state of context health monitoring in Codex CLI as of v0.130, the configuration levers available today, practical patterns for detecting and mitigating degradation, and the community-driven push for proper compaction telemetry.

The Invisible Degradation Problem

Every Codex CLI session accumulates tokens across three categories: the system prompt and AGENTS.md instructions, tool call inputs and outputs, and the conversational history between you and the model ¹. When the total approaches the model’s context window — 200K tokens for GPT-5.4-codex, up to 1M for GPT-5.5 — compaction fires automatically ².

Compaction summarises older conversation turns into a compressed representation, discarding detail to free space. The first compaction is usually benign. The second loses fidelity. By the third, the model may have lost track of architectural decisions made early in the session ³.

The core issue, as documented in GitHub Issue #22220, is that users currently cannot determine ⁴:

How many compactions have occurred in the current session
The current context quality or integrity level
When the last compaction triggered
Whether a compaction loop is forming
How aggressively compression is running relative to available context

graph TD
    A[Session Start<br/>0% context used] --> B[Work accumulates<br/>tokens grow]
    B --> C{Context usage<br/>>threshold?}
    C -->|No| B
    C -->|Yes| D[Auto-compaction fires]
    D --> E[Summary replaces<br/>older turns]
    E --> F{Quality check}
    F -->|Acceptable| B
    F -->|Degraded| G[Silent quality loss<br/>No telemetry visible]
    G --> B
    G --> H[Compaction loop risk]

    style G fill:#f44,color:#fff
    style H fill:#f44,color:#fff

What You Can See Today

Codex CLI does expose some context state, but it requires manual checking.

The `/status` Command

Running /status in the TUI shows current context usage as a percentage ⁵. This is the single most useful signal available today — but it is a point-in-time snapshot, not a trend. You cannot see whether you were at 60% five minutes ago and are now at 92%.

The Context Percentage Indicator

The TUI status line shows a persistent context usage percentage ³. When this climbs past 80%, you are approaching compaction territory. A session sitting at 95% with three compactions behind it will produce measurably worse output than a fresh session with focused context.

The `codex exec --json` Token Reporting

Since v0.125.0, codex exec --json reports reasoning-token usage in the turn.completed event ⁶:

codex exec --json "Refactor the payments module" \
  | jq 'select(.type == "turn.completed") | .usage'

This yields input_tokens, cached_input_tokens, output_tokens, and reasoning_output_tokens — useful for CI pipelines but not for interactive session monitoring.

Configuration Levers for Context Management

Three config.toml settings directly affect compaction behaviour.

`model_auto_compact_token_limit`

Sets the token threshold that triggers automatic compaction ⁷. When unset, Codex uses model-specific defaults (typically ~95% of the context window). For projects with large AGENTS.md files, monorepo file maps, or heavy MCP tool schemas, lowering this to 80-85% of context capacity is advisable:

# config.toml — project-level override
model_auto_compact_token_limit = 160000  # ~80% of 200K window

`compact_prompt`

Overrides the default compaction summarisation prompt ⁸. This only affects the local compaction path — when using OpenAI-hosted models, server-side compaction ignores this setting. A structured handoff prompt preserves more actionable state:

compact_prompt = """
Summarise this coding session as a structured handoff:
1. Current task and progress percentage
2. Files modified with change descriptions
3. Decisions made and their rationale
4. Blockers encountered
5. Clear next steps as a numbered list
Preserve all variable names, function signatures, and file paths exactly.
"""

`tool_output_token_limit`

Restricts the token budget for individual tool outputs in conversation history ⁷. Large tool outputs — Playwright snapshots, grep results across monorepos, full file reads — are the primary cause of rapid context consumption. Setting a reasonable ceiling prevents a single tool call from consuming 20% of your context:

tool_output_token_limit = 8000

Manual Compaction: The `/compact` Command

Since v0.117.0, /compact supports queued follow-up instructions ⁹. This is the single most effective technique for maintaining session quality:

/compact Focus on the authentication refactor and the three failing tests.
Preserve the database schema decisions from earlier.

The critical insight: compact at 60% context usage, not 95% ³. Waiting for the automatic trigger means compaction runs under pressure, with less room to produce a high-quality summary. Manual compaction at a natural task boundary — after completing a sub-task, before starting a new file — gives the summariser more breathing room and produces better results.

Building Your Own Context Health Telemetry

While Codex CLI lacks built-in compaction telemetry, you can approximate it using hooks and external tooling.

PostToolUse Hooks for Token Monitoring

A PostToolUse hook can log token consumption after each tool call, building a local telemetry trail ¹⁰:

# config.toml
[hooks.post_tool_use.context_monitor]
command = "python3 /path/to/context_monitor.py"

#!/usr/bin/env python3
# context_monitor.py — log token usage after each tool call
import json, sys, os
from datetime import datetime

data = json.load(sys.stdin)
log_path = os.path.expanduser("~/.codex/context_health.jsonl")

entry = {
    "timestamp": datetime.utcnow().isoformat(),
    "tool": data.get("tool_name", "unknown"),
    "output_tokens": len(data.get("output", "")),
    "session_id": os.environ.get("CODEX_SESSION_ID", "unknown"),
}

with open(log_path, "a") as f:
    f.write(json.dumps(entry) + "\n")

context-mode MCP Server

The context-mode MCP server (npm package) sandboxes raw tool output, achieving up to 98% context reduction on outputs like Playwright snapshots, GitHub issue bodies, and log files ¹¹. It works across 15 platforms including Codex CLI:

npm install -g context-mode

{
  "mcpServers": {
    "context-mode": {
      "command": "context-mode",
      "args": ["--platform", "codex"]
    }
  }
}

Note: Codex CLI’s PreToolUse routing currently supports deny rules only. Full input rewriting awaits upstream updatedInput support ¹¹.

Session Checkpoint Pattern

For critical long-running sessions, create explicit checkpoints by combining /compact with a structured prompt and a git commit:

/compact CHECKPOINT: Authentication middleware complete.
Files: src/auth/middleware.ts, src/auth/jwt.ts, tests/auth/*.test.ts
All 14 tests passing. Next: rate limiting layer.

Then commit your work. If session quality degrades beyond recovery, you can start a fresh session with --resume from the checkpoint.

Detecting Compaction Loops

Compaction loops — where the model compacts, immediately fills context again, and compacts again in rapid succession — are the most dangerous failure mode. They were particularly severe in v0.118, where a regression caused compaction to trigger approximately twice as frequently as in v0.116, doubling or tripling token consumption ¹².

Signs of a compaction loop:

Rapid context percentage oscillation — watching /status shows 95% → 40% → 90% within a few turns
Repeated identical tool calls — the model re-reads files it already processed, having lost that context to compaction
Token drain — codex exec --json shows reasoning_output_tokens spiking without corresponding useful work
Response quality cliff — the model starts asking questions it answered earlier in the session

The workaround for v0.118-era loops was to drop reasoning effort from xhigh to high ¹². In current versions, the primary defence is proactive manual compaction before the automatic trigger fires.

The Compaction API: Server-Side vs Local

It is worth understanding that Codex CLI uses two distinct compaction paths ¹³:

Aspect	Server-Side (Fast Path)	Local Path
Trigger	`compact_threshold` in Responses API	`model_auto_compact_token_limit`
Summarisation	OpenAI-hosted, opaque	Local model call, configurable
`compact_prompt` respected?	No	Yes
Output format	Encrypted compaction block	Readable summary
User control	Minimal	Full

When using OpenAI-hosted models (the default), server-side compaction handles summarisation, and your compact_prompt configuration is ignored ⁸. The /compact slash command always uses the local path, which is why manual compaction with a custom prompt is more controllable than the automatic trigger.

What Is Coming: The Telemetry Roadmap

GitHub Issue #22220 proposes exposing lightweight compaction telemetry via the UI and /status ⁴:

Compaction count — “Compactions this session: 3”
Time since last compaction — useful for detecting acceleration
Context usage percentage — already available, but proposed for richer display
Estimated context integrity level — a quality score based on compaction depth

Eric Traut from OpenAI responded that the team is “working on an update to our compaction mechanism” and hopes improved compaction reliability may eventually eliminate the need for dedicated telemetry ⁴. The PreCompact and PostCompact hooks requested in Issue #16098 remain unimplemented ¹⁴.

The minimal viable improvement, as proposed by the community, is simply a compaction counter visible in /status:

“Compactions this session: 3” displayed in /status, session metadata, or near the context percentage indicator would substantially help.

Practical Recommendations

Set model_auto_compact_token_limit to 80% of your model’s context window — this gives compaction more room to produce quality summaries
Use /compact manually at natural task boundaries — after completing a sub-task, before context hits 60%
Write a structured compact_prompt — preserve file paths, function signatures, and decision rationale explicitly
Install context-mode if your workflow involves large tool outputs — Playwright snapshots, grep across monorepos, or full file reads
Watch for compaction loop symptoms — rapid context oscillation, repeated file reads, and reasoning token spikes
For critical sessions, use the checkpoint pattern — /compact with explicit state, then git commit
Keep reasoning effort at high rather than xhigh for sessions expected to exceed 30 minutes

Until Codex CLI ships native compaction telemetry, context health monitoring remains a manual discipline. The tools exist to build approximate observability — hooks, token reporting, and manual checkpoints — but they require deliberate setup. The gap between “context compaction exists” and “I can see what compaction is doing to my session” remains the largest observability blind spot in agentic coding workflows today.

Context Health Monitoring in Codex CLI: Compaction Telemetry, Degradation Detection, and Long-Session Quality Patterns

The Invisible Degradation Problem

What You Can See Today

The /status Command

The Context Percentage Indicator

The codex exec --json Token Reporting

Configuration Levers for Context Management

model_auto_compact_token_limit

compact_prompt

tool_output_token_limit

Manual Compaction: The /compact Command

Building Your Own Context Health Telemetry

PostToolUse Hooks for Token Monitoring

context-mode MCP Server

Session Checkpoint Pattern

Detecting Compaction Loops

The Compaction API: Server-Side vs Local

What Is Coming: The Telemetry Roadmap

Practical Recommendations

Citations

The `/status` Command

The `codex exec --json` Token Reporting

`model_auto_compact_token_limit`

`compact_prompt`

`tool_output_token_limit`

Manual Compaction: The `/compact` Command