Cross-Model Adversarial Review: Using Multiple AI Models to Catch Agent Blind Spots

6 minute read

Published: 2026-03-28

The moment your coding agent reviews its own output, you have a problem. Not because the agent is dishonest — but because it is architecturally incapable of neutrality. The same model that rationalised a design shortcut during implementation will rationalise it again during review. This is not a limitation that better prompting can fix. It requires a structural solution: a different model doing the reviewing.

Why Same-Model Self-Review Fails

LLMs struggle with self-validation. They tend to “hallucinate correctness” — reproducing the same reasoning errors they made during generation. A model reviewing its own code is not checking the code against the spec; it is checking whether the code looks like it was generated correctly.

The result is a class of bugs that consistently survive same-model review:

Silent violations — code that passes tests but violates architectural constraints (e.g., filtering in memory instead of at the database layer)
Spec drift — requirements that were satisfied in the first iteration but quietly regressed
Security blind spots — patterns the generating model treats as standard practice (global state, shared mutable structures, weak input sanitisation)
Edge-case omissions — paths the model never considered during generation, and therefore never checks for during review

One practitioner described the realisation clearly: “you need to have your agents adversarially reviewed… not having Claude review Claude’s output — have Codex and Gemini review Claude’s output. So every time my Claude Code sessions do anything, they commit and the commit is immediately reviewed by either Codex or Gemini or both.”

The Builder-Critic Architecture

The solution is structural. Two roles, different models, clean context separation.

┌─────────────────────────────────────────────────────┐
│                ADVERSARIAL REVIEW LOOP              │
│                                                     │
│  ┌──────────┐    implements    ┌────────────────┐   │
│  │  SPEC /  │ ──────────────► │  BUILDER       │   │
│  │  TESTS   │                 │  (Codex CLI)   │   │
│  └──────────┘                 └───────┬────────┘   │
│                                       │             │
│                               code diff             │
│                                       │             │
│                               ┌───────▼────────┐   │
│                               │  CRITIC        │   │
│                               │  (Claude Code) │   │
│                               └───────┬────────┘   │
│                                       │             │
│                              PASS / violations      │
│                                       │             │
│                         ┌─────────────▼──────────┐ │
│                         │  MODERATOR (optional)  │ │
│                         │  deduplicates findings │ │
│                         └────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Builder role — optimised for implementation speed. Receives the spec, generates code, writes tests. Typically Codex CLI (--model gpt-5.4 for routine tasks, codex-spark for fast iteration).

Critic role — optimised for reasoning depth. Receives the spec and the code diff in a fresh context (no memory of the build). Produces PASS or an ordered list of violations with line references. Typically Claude Code or a high-reasoning Codex session.

Critical step — context swap: Start a new session for the Critic. Do not share conversation history from the build phase. This forces the Critic to evaluate only the artifacts, not the reasoning that produced them.

Model Routing in Practice

Role	Good model choice	Why
Builder (routine)	`codex-spark` / `gpt-5.4-mini`	High throughput, low cost
Builder (complex)	`gpt-5.4` with `reasoning: high`	Better architectural decisions
Critic	Claude Code (`claude-opus-4.x`)	Strong constraint validation, different training distribution
Critic (fast)	Gemini CLI	Different architecture catches different patterns
Moderator	Any fast model	Deduplication only, low reasoning load

The key principle: use models from different training distributions. A Claude-reviewed Codex PR is more reliable than a Codex-reviewed Codex PR — not because Claude is “better”, but because it was trained on different data with different biases.

Implementing the Pattern in AGENTS.md

## Review Policy

All PRs require cross-model adversarial review before human review:

1. Builder subagent implements (Codex CLI, model: codex-spark)
2. All tests must pass locally before review step
3. Critic runs in a FRESH session (no shared history):
   - Feed: spec + test file + code diff only
   - Model: claude-opus or gemini (NOT the same model that built)
   - Task: list spec violations with file:line evidence. PASS or FAIL verdict.
4. On FAIL: Builder fixes, re-validates, spawns a NEW critic (not same session)
5. After 3 failures: escalate to human

Never use the same model as both builder and critic.
Never reuse a critic session — anchoring bias is real, even for AI.

The Debate Loop Pattern (alecnielsen/adversarial-review)

For high-stakes changes, a multi-round debate produces better results than a single-pass review:

Round 1: Claude and Codex independently review the same diff
Round 2: Each agent critiques the other's findings
Round 3: Each agent responds to critiques
Round 4: One agent synthesises and implements validated fixes

The circuit breaker matters: stop if no progress after 3 iterations, or if disagreement persists after 5. Infinite debate loops are a real failure mode.

Approximate cost: ~21 API calls worst-case for three iterations. High, but justified for changes touching authentication, data migrations, or public APIs.

Metaswarm’s Industrial-Scale Approach

Metaswarm implements adversarial review at framework scale. Every work unit runs through a four-phase loop:

IMPLEMENT → VALIDATE → ADVERSARIAL REVIEW → COMMIT

Key rules:

Writer is always reviewed by a different model (Codex implements, Claude reviews — or vice versa)
Fresh reviewer on every retry — the reviewer checks the contract, not “did they fix what I found?”
Orchestrator independently verifies results (never trusts subagent self-reports)
Parallel Design Review Gate: 5 specialist agents (PM, Architect, Designer, Security, CTO) reviewing in parallel before implementation begins

The parallel design review gate is worth noting: five specialist personas reviewing a spec simultaneously, each bringing a different adversarial lens. The overlap in findings increases confidence; the disagreements surface spec ambiguities before a line of code is written.

CI/CD Integration

Adversarial review works as a CI gate:

# .github/workflows/adversarial-review.yml
name: Adversarial Code Review
on: [pull_request]
jobs:
  critic:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run critic agent
        run: |
          # Pass spec + diff to a different model than the one that generated the PR
          codex exec "You are a Critic agent. Review this diff against SPEC.md.
          List all spec violations with file:line evidence. Output PASS or FAIL." \
            --model claude-proxy \  # route to Claude for cross-model review
            --approval-mode full-auto \
            --output-format json > review.json

          # Fail the PR if FAIL verdict
          jq -e '.verdict == "PASS"' review.json

Note: Claude proxy routing requires an MCP bridge or custom shell wrapper since codex exec natively uses OpenAI models.

When Same-Model Review is Acceptable

Same-model review is not always wrong. It is acceptable for:

Small, factual diffs — renaming a variable, updating a config value, fixing a typo
Style/format checks — lint-level concerns with no architectural implications
Documentation review — lower stakes, less likely to have hidden logic errors

It is unacceptable for:

Any change touching authentication, authorisation, or data handling
Database schema changes or migrations
Public API changes
Code that’s been failing tests and has been “fixed” by an agent

Cost Tradeoffs

Cross-model adversarial review is not cheap. The overhead is real:

Two model calls instead of one for every reviewed change
Moderate cost for debate loops (3-5 rounds × 2 models)
Latency penalty: critics add 30-90 seconds per review depending on diff size

The calculation: one production security incident costs more than months of adversarial review overhead. For changes where mistakes are expensive, the cost is justified. For routine maintenance, skip it.

Tools

Tool	URL	Approach
alecnielsen/adversarial-review	https://github.com/alecnielsen/adversarial-review	Claude + GPT Codex debate loop, bash script, multi-round
dsifry/metaswarm	https://github.com/dsifry/metaswarm	Full framework with adversarial review built into SDLC
codexstar69/bug-hunter	https://github.com/codexstar69/bug-hunter	Hunter + Skeptic + Referee tri-agent security review
asdlc.io adversarial-code-review	https://asdlc.io/patterns/adversarial-code-review/	Pattern documentation with Critic prompt examples

Key Takeaways

Same-model self-review perpetuates the same blind spots — structural fix required
Always start the Critic in a fresh context — conversation history from the build phase contaminates the review
Use models from different training distributions for genuine adversarial benefit
The writer-reviewer rule: the model that wrote the code must not review it
On retry: always spawn a new reviewer session — anchoring bias affects AI as well as humans
For CI/CD: adversarial review as a gate, not an afterthought

Sources: asdlc.io adversarial code review, metaswarm, alecnielsen/adversarial-review, halallens.no multi-model guide

Twitter Facebook LinkedIn