Agentic Loop Gotchas: Six Stop-Condition and Error-Propagation Mistakes That Cause Silent Failures in Codex CLI
Agentic Loop Gotchas: Six Stop-Condition and Error-Propagation Mistakes That Cause Silent Failures in Codex CLI
An agentic loop is simple in theory — call the model, execute tools, feed results back, repeat until done. In practice, mishandling stop conditions and error propagation creates agents that silently stop too early, loop forever, or proceed on corrupted state. These failures are insidious because the agent appears to complete successfully. You only discover the problem when reviewing output hours later, or when a downstream consumer hits corrupted data.
The Claude Certified Architect (CCA-F) exam tests these patterns extensively because they represent the most common source of production agentic failures. The same gotchas apply to Codex CLI, whether you are running interactive sessions, codex exec pipelines, or goal-mode orchestrations.
Gotcha 1: Parsing Natural Language to Detect Completion
The mistake: Checking whether the agent’s text response contains phrases like “I’m done”, “task complete”, or “all finished” to decide whether to stop the loop.
Why it breaks: The model’s text output is not a structured signal. It can say “I’m done with the first file” while intending to continue with the second. It can say “that completes the task” as part of explaining what it did, then issue another tool call in the same response. Text is narrative, not protocol.
What Codex CLI actually uses: The stop_reason field in the API response is the only reliable signal:
| stop_reason | Meaning | Correct action |
|---|---|---|
tool_use |
Model wants to call tools | Continue the loop — execute tools, return results |
end_turn |
Model believes the task is complete | Stop the loop |
max_tokens |
Output was truncated | Extend output or summarise and continue |
refusal |
Model refused the request | Log, report to user, stop |
pause_turn |
Model yielding for human input | Wait for input or provide context |
The fix for custom harnesses:
while True:
response = client.messages.create(...)
# Process any tool calls in the response
tool_results = execute_tool_calls(response)
# THE ONLY CORRECT TERMINATION CHECK
if response.stop_reason != "tool_use":
break
# Feed results back
messages.append(response)
messages.append(tool_results)
For Codex CLI users: This is handled automatically in interactive mode and goal mode. But if you are building automation around codex exec with --json output and parsing the response programmatically, check the structured output — never grep the text for completion phrases.
Gotcha 2: Using Iteration Caps as the Primary Stop Mechanism
The mistake: Setting max_iterations = 10 (or similar) as the main way to prevent runaway loops.
Why it breaks in both directions:
- Too low: A legitimate multi-file refactoring might need 30+ tool calls (read files, plan, write, test, fix, repeat). Cutting off at 10 means the agent stops mid-task, leaving a half-refactored codebase.
- Too high: An agent stuck in a retry loop (failing test → same fix → failing test) wastes hundreds of API calls before hitting the cap.
The underlying problem: an iteration cap cannot distinguish between productive progress and unproductive looping. It is a blunt instrument applied without understanding.
The correct pattern: Use stop_reason as the primary control. Use iteration caps only as a safety net for catastrophic scenarios (infinite loops due to API bugs, malformed tool responses):
# config.toml — safety net, not primary control
[agent]
max_turns = 200 # Emergency brake — should never be hit in normal operation
token_budget = 500000 # Cost ceiling — more meaningful than turn count
For goal mode: Codex CLI’s goal mode already handles this correctly — it uses the model’s own completion signal and budget constraints rather than arbitrary turn limits. If you find yourself lowering max_turns to “keep costs down”, you have a prompt/scope problem, not a configuration problem.
Gotcha 3: Suppressing Errors with Status OK
The mistake: Tool implementations that return {"status": "ok", "data": null} or {"status": "success", "result": ""} when they actually failed — because the developer did not want the agent to “get confused” by error messages.
Why it breaks: The model cannot distinguish between “the operation succeeded but returned nothing” and “the operation failed.” It proceeds with its plan based on corrupted state. Three turns later, it writes code that depends on a file that was never created, references data that was never fetched, or commits changes to a branch that does not exist.
The is_error pattern:
When returning tool results in the API, the is_error field explicitly tells the model “this was a failure, adjust your plan”:
{
"type": "tool_result",
"tool_use_id": "call_abc123",
"is_error": true,
"content": "Failed to read /src/main/java/App.java: file not found. The file may have been renamed or moved during the previous refactoring step."
}
Why this matters for Codex CLI: When writing MCP servers or custom tools for Codex CLI, your tool implementation should:
- Return
is_error: truewith a descriptive message on failure — never swallow errors - Include recovery suggestions in the error message (“file not found — check if the previous rename completed”)
- Never return empty success when the operation had no effect
// MCP server tool handler — correct error propagation
async function readFileHandler(params: { path: string }) {
try {
const content = await fs.readFile(params.path, 'utf-8');
return { content: [{ type: "text", text: content }] };
} catch (err) {
return {
isError: true,
content: [{
type: "text",
text: `Cannot read ${params.path}: ${err.message}. Check the path exists and the sandbox allows read access to this location.`
}]
};
}
}
Gotcha 4: Not Handling max_tokens Distinctly from end_turn
The mistake: Treating any non-tool_use stop reason as “the agent is done.”
Why it breaks: max_tokens means the model’s output was truncated mid-sentence — it did not finish its thought or its tool call. If you stop the loop here, you get a half-written file, an incomplete plan, or a malformed JSON tool call that never executes.
model_context_window_exceeded is even more dangerous — it means the conversation has grown beyond what the model can process. Simply retrying will hit the same wall.
The correct multi-branch handling:
match response.stop_reason:
case "tool_use":
# Continue — execute tools and loop
pass
case "end_turn":
# Agent believes task is complete — stop
break
case "max_tokens":
# Output truncated — extend or compact
messages.append({"role": "user", "content": "Continue from where you left off."})
case "refusal":
# Model refused — cannot retry the same request
log_refusal(response)
break
case "pause_turn":
# Waiting for human — in automation, provide default or stop
break
case "model_context_window_exceeded":
# Critical — must reduce context before continuing
messages = compact_messages(messages)
case _:
# Unknown — log and stop safely
log_unknown_stop_reason(response)
break
For Codex CLI: In interactive sessions, the CLI handles these branches internally. In codex exec pipelines and goal mode, max_tokens truncation is the most common silent failure — the agent’s final output is cut off, and downstream scripts receive incomplete JSON or half-written code.
The defence: Set generous max_output_tokens for generation-heavy tasks, and always validate structured output before using it downstream.
Gotcha 5: Immediate Termination on First Tool Error
The mistake: Building retry logic that gives up immediately when a tool call fails, or having no retry logic at all.
Why it breaks: Many tool failures are transient:
- File locks held by another process (retry in 1 second)
- Network timeouts to an MCP server (retry with backoff)
- Rate limits on external APIs (retry after delay)
- Compilation errors from partial writes (the agent hasn’t finished writing all files)
Terminating immediately means the agent never recovers from situations that would resolve in 2-3 seconds.
The correct pattern — graduated retry:
MAX_RETRIES = 3
for attempt in range(MAX_RETRIES):
result = execute_tool(tool_call)
if not result.is_error:
break
if attempt < MAX_RETRIES - 1:
# Feed the error back to the model — it may adjust its approach
messages.append(tool_error_result(result))
response = client.messages.create(...) # Model sees the error
# Model may try a different approach on next tool call
else:
# Final failure — propagate to model with full context
messages.append(final_failure_result(result, attempts=MAX_RETRIES))
For Codex CLI: The CLI’s built-in retry logic handles transient failures for its own tools (file reads, bash commands). But MCP server tools and custom integrations need their own retry handling. If your MCP server returns errors for transient conditions, the agent will often attempt a different approach rather than retrying — which may be worse. Consider implementing retry logic inside your MCP server rather than relying on the agent to retry.
Gotcha 6: Subagent Errors That Vanish at the Coordinator Level
The mistake: Spawning subagents (via --decompose parallel or goal mode’s internal parallelism) and aggregating their results without checking individual completion status.
Why it breaks: A coordinator spawns three subagents: one to refactor module A, one for module B, one for module C. Module B’s subagent hits a compilation error and produces partial output. The coordinator receives all three results, does not notice that B’s result is incomplete, and synthesises a “completed” summary that omits the B module changes.
The result: two-thirds of the refactoring is done, the coordinator reports success, and module B is left in an inconsistent state.
The correct pattern:
# In AGENTS.md — instruction for coordinator behaviour
## Subagent result handling
When aggregating results from parallel subagents:
1. Check each subagent's exit status explicitly
2. If ANY subagent reports failure, do NOT mark the overall task as complete
3. Report partial completion with specific details of which subtasks failed
4. Attempt recovery only if the failure is isolated and does not affect other subtasks
For custom harnesses:
results = await asyncio.gather(*[run_subagent(task) for task in subtasks])
failures = [r for r in results if r.status == "failed"]
if failures:
# Do NOT suppress — report structured failure
return CoordinatorResult(
status="partial_failure",
completed=[r.task_id for r in results if r.status == "success"],
failed=[{"task_id": r.task_id, "error": r.error} for r in failures],
summary=f"{len(results) - len(failures)}/{len(results)} subtasks completed"
)
The Meta-Pattern: Structured Signals Over Natural Language
Every gotcha in this article reduces to the same principle: use structured, machine-readable signals for control flow; reserve natural language for communication with humans.
| Decision | Wrong signal | Right signal |
|---|---|---|
| Should the loop continue? | Model says “I’m done” | stop_reason == "tool_use" |
| Did the tool succeed? | Empty response body | is_error: true/false |
| Is the output complete? | Text ends with a period | stop_reason != "max_tokens" |
| Did the subagent finish? | Summary sounds complete | Exit status + structured result |
| Should we retry? | Error message “looks transient” | Error category + retry policy |
The agentic loop is a state machine. State machines run on structured transitions, not vibes. Every time you make a control-flow decision based on natural language interpretation, you introduce a probabilistic failure mode into what should be a deterministic system.
Codex CLI’s internal loop already follows these principles. The gotchas emerge when you build automation around it (scripts parsing --json output), when you write MCP servers (returning results without is_error), or when you design multi-agent orchestrations (coordinators that trust subagent text summaries over structured status).
References
- OpenAI, “Agentic Loop — Codex CLI,” developers.openai.com, 2026. Stop reason semantics, tool result format, retry behaviour.
- OpenAI, “Goal Mode — Codex CLI,” developers.openai.com, 2026. Budget-based termination, maker-verifier separation.
- CCA-F exam anti-patterns (Domain 1). “Parsing natural-language signals to terminate the loop” and “Arbitrary iteration caps as the primary stop mechanism” are confirmed exam distractors.
- Anthropic, “Building Effective Agents,” anthropic.com/research, 2025. The agentic loop pattern, error handling with
is_error, structured tool results.