Cross-Model Adversarial Review: Using Multiple AI Models to Catch Agent Blind Spots
Published: 2026-03-28
The moment your coding agent reviews its own output, you have a problem. Not because the agent is dishonest — but because it is architecturally incapable of neutrality. The same model that rationalised a design shortcut during implementation will rationalise it again during review. This is not a limitation that better prompting can fix. It requires a structural solution: a different model doing the reviewing.
Why Same-Model Self-Review Fails
LLMs struggle with self-validation. They tend to “hallucinate correctness” — reproducing the same reasoning errors they made during generation. A model reviewing its own code is not checking the code against the spec; it is checking whether the code looks like it was generated correctly.
The result is a class of bugs that consistently survive same-model review:
- Silent violations — code that passes tests but violates architectural constraints (e.g., filtering in memory instead of at the database layer)
- Spec drift — requirements that were satisfied in the first iteration but quietly regressed
- Security blind spots — patterns the generating model treats as standard practice (global state, shared mutable structures, weak input sanitisation)
- Edge-case omissions — paths the model never considered during generation, and therefore never checks for during review
One practitioner described the realisation clearly: “you need to have your agents adversarially reviewed… not having Claude review Claude’s output — have Codex and Gemini review Claude’s output. So every time my Claude Code sessions do anything, they commit and the commit is immediately reviewed by either Codex or Gemini or both.”
The Builder-Critic Architecture
The solution is structural. Two roles, different models, clean context separation.
┌─────────────────────────────────────────────────────┐
│ ADVERSARIAL REVIEW LOOP │
│ │
│ ┌──────────┐ implements ┌────────────────┐ │
│ │ SPEC / │ ──────────────► │ BUILDER │ │
│ │ TESTS │ │ (Codex CLI) │ │
│ └──────────┘ └───────┬────────┘ │
│ │ │
│ code diff │
│ │ │
│ ┌───────▼────────┐ │
│ │ CRITIC │ │
│ │ (Claude Code) │ │
│ └───────┬────────┘ │
│ │ │
│ PASS / violations │
│ │ │
│ ┌─────────────▼──────────┐ │
│ │ MODERATOR (optional) │ │
│ │ deduplicates findings │ │
│ └────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Builder role — optimised for implementation speed. Receives the spec, generates code, writes tests. Typically Codex CLI (--model gpt-5.4 for routine tasks, codex-spark for fast iteration).
Critic role — optimised for reasoning depth. Receives the spec and the code diff in a fresh context (no memory of the build). Produces PASS or an ordered list of violations with line references. Typically Claude Code or a high-reasoning Codex session.
Critical step — context swap: Start a new session for the Critic. Do not share conversation history from the build phase. This forces the Critic to evaluate only the artifacts, not the reasoning that produced them.
Model Routing in Practice
| Role | Good model choice | Why |
|---|---|---|
| Builder (routine) | codex-spark / gpt-5.4-mini |
High throughput, low cost |
| Builder (complex) | gpt-5.4 with reasoning: high |
Better architectural decisions |
| Critic | Claude Code (claude-opus-4.x) |
Strong constraint validation, different training distribution |
| Critic (fast) | Gemini CLI | Different architecture catches different patterns |
| Moderator | Any fast model | Deduplication only, low reasoning load |
The key principle: use models from different training distributions. A Claude-reviewed Codex PR is more reliable than a Codex-reviewed Codex PR — not because Claude is “better”, but because it was trained on different data with different biases.
Implementing the Pattern in AGENTS.md
## Review Policy
All PRs require cross-model adversarial review before human review:
1. Builder subagent implements (Codex CLI, model: codex-spark)
2. All tests must pass locally before review step
3. Critic runs in a FRESH session (no shared history):
- Feed: spec + test file + code diff only
- Model: claude-opus or gemini (NOT the same model that built)
- Task: list spec violations with file:line evidence. PASS or FAIL verdict.
4. On FAIL: Builder fixes, re-validates, spawns a NEW critic (not same session)
5. After 3 failures: escalate to human
Never use the same model as both builder and critic.
Never reuse a critic session — anchoring bias is real, even for AI.
The Debate Loop Pattern (alecnielsen/adversarial-review)
For high-stakes changes, a multi-round debate produces better results than a single-pass review:
Round 1: Claude and Codex independently review the same diff
Round 2: Each agent critiques the other's findings
Round 3: Each agent responds to critiques
Round 4: One agent synthesises and implements validated fixes
The circuit breaker matters: stop if no progress after 3 iterations, or if disagreement persists after 5. Infinite debate loops are a real failure mode.
Approximate cost: ~21 API calls worst-case for three iterations. High, but justified for changes touching authentication, data migrations, or public APIs.
Metaswarm’s Industrial-Scale Approach
Metaswarm implements adversarial review at framework scale. Every work unit runs through a four-phase loop:
IMPLEMENT → VALIDATE → ADVERSARIAL REVIEW → COMMIT
Key rules:
- Writer is always reviewed by a different model (Codex implements, Claude reviews — or vice versa)
- Fresh reviewer on every retry — the reviewer checks the contract, not “did they fix what I found?”
- Orchestrator independently verifies results (never trusts subagent self-reports)
- Parallel Design Review Gate: 5 specialist agents (PM, Architect, Designer, Security, CTO) reviewing in parallel before implementation begins
The parallel design review gate is worth noting: five specialist personas reviewing a spec simultaneously, each bringing a different adversarial lens. The overlap in findings increases confidence; the disagreements surface spec ambiguities before a line of code is written.
CI/CD Integration
Adversarial review works as a CI gate:
# .github/workflows/adversarial-review.yml
name: Adversarial Code Review
on: [pull_request]
jobs:
critic:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run critic agent
run: |
# Pass spec + diff to a different model than the one that generated the PR
codex exec "You are a Critic agent. Review this diff against SPEC.md.
List all spec violations with file:line evidence. Output PASS or FAIL." \
--model claude-proxy \ # route to Claude for cross-model review
--approval-mode full-auto \
--output-format json > review.json
# Fail the PR if FAIL verdict
jq -e '.verdict == "PASS"' review.json
Note: Claude proxy routing requires an MCP bridge or custom shell wrapper since codex exec natively uses OpenAI models.
When Same-Model Review is Acceptable
Same-model review is not always wrong. It is acceptable for:
- Small, factual diffs — renaming a variable, updating a config value, fixing a typo
- Style/format checks — lint-level concerns with no architectural implications
- Documentation review — lower stakes, less likely to have hidden logic errors
It is unacceptable for:
- Any change touching authentication, authorisation, or data handling
- Database schema changes or migrations
- Public API changes
- Code that’s been failing tests and has been “fixed” by an agent
Cost Tradeoffs
Cross-model adversarial review is not cheap. The overhead is real:
- Two model calls instead of one for every reviewed change
- Moderate cost for debate loops (3-5 rounds × 2 models)
- Latency penalty: critics add 30-90 seconds per review depending on diff size
The calculation: one production security incident costs more than months of adversarial review overhead. For changes where mistakes are expensive, the cost is justified. For routine maintenance, skip it.
Tools
| Tool | URL | Approach |
|---|---|---|
| alecnielsen/adversarial-review | https://github.com/alecnielsen/adversarial-review | Claude + GPT Codex debate loop, bash script, multi-round |
| dsifry/metaswarm | https://github.com/dsifry/metaswarm | Full framework with adversarial review built into SDLC |
| codexstar69/bug-hunter | https://github.com/codexstar69/bug-hunter | Hunter + Skeptic + Referee tri-agent security review |
| asdlc.io adversarial-code-review | https://asdlc.io/patterns/adversarial-code-review/ | Pattern documentation with Critic prompt examples |
Key Takeaways
- Same-model self-review perpetuates the same blind spots — structural fix required
- Always start the Critic in a fresh context — conversation history from the build phase contaminates the review
- Use models from different training distributions for genuine adversarial benefit
- The writer-reviewer rule: the model that wrote the code must not review it
- On retry: always spawn a new reviewer session — anchoring bias affects AI as well as humans
- For CI/CD: adversarial review as a gate, not an afterthought
Sources: asdlc.io adversarial code review, metaswarm, alecnielsen/adversarial-review, halallens.no multi-model guide