Community Workflow Frameworks for Codex CLI: Superpowers, GSD, gstack, Spec Kit, OMX, and Compound Engineering Compared
Community Workflow Frameworks for Codex CLI: Superpowers, GSD, gstack, Spec Kit, OMX, and Compound Engineering Compared
Codex CLI ships with a deliberately minimal orchestration layer: an agent loop, a sandbox, hooks, and skills. That minimalism has spawned an ecosystem of community-built workflow frameworks — each imposing a different set of constraints on how you use the tool. By April 2026, six frameworks have emerged with significant adoption, collectively representing over 400,000 GitHub stars1. Choosing the wrong one wastes days; choosing the right one compounds every session.
This article compares all six through a single lens: what does each framework constrain, and why?
The Landscape at a Glance
| Framework | Stars (Apr 2026) | Primary Constraint | Best For |
|---|---|---|---|
| Superpowers | ~150K2 | Development process (TDD) | Solo devs needing test discipline |
| GSD | ~51K3 | Execution environment (context) | Multi-day complex projects |
| gstack | ~71K4 | Decision authority (roles) | Founder-engineers shipping product |
| Spec Kit | ~91K5 | Specification fidelity | Teams with formal requirements |
| OMX | ~23K6 | Parallelism (agent coordination) | Multi-agent team orchestration |
| Compound Engineering | ~13K7 | Review breadth (26 reviewers) | Quality-obsessed shipping workflows |
graph TD
A["Raw Codex CLI"] --> B["Choose Framework"]
B --> C["Superpowers<br/>TDD-first"]
B --> D["GSD<br/>Context-fresh"]
B --> E["gstack<br/>Role-governed"]
B --> F["Spec Kit<br/>Spec-driven"]
B --> G["OMX<br/>Team runtime"]
B --> H["Compound Eng.<br/>Review-heavy"]
C --> I["Iron Law: No code<br/>without failing test"]
D --> J["Wave execution in<br/>fresh 200K windows"]
E --> K["23-person simulated<br/>team with role gates"]
F --> L["Constitution +<br/>six-phase pipeline"]
G --> M["tmux workers +<br/>git worktrees"]
H --> N["14 parallel<br/>reviewers per PR"]
Superpowers: The TDD Enforcer
Superpowers, created by Jesse Vincent (obra), is the most adopted framework in the ecosystem2. Its philosophy is simple and uncompromising: no production code without a failing test first. The framework calls this the “Iron Law” — violate it, and Superpowers deletes the offending code and forces a restart from the test phase8.
How It Works with Codex CLI
Superpowers installs as a skills package under .agents/skills/. On Codex CLI, it provides seven phases8:
- Brainstorm — refine rough ideas through structured questions
- Spec — produce a formal specification document
- Plan — create a task breakdown enforcing DRY, YAGNI, and TDD
- TDD — write failing tests for every planned task
- Subagent Dev — implement code against failing tests using fresh subagents
- Review — cross-check implementation against spec
- Finalise — merge, clean up, and document
The subagent phase is key for Codex CLI users. Superpowers spawns a fresh subagent per task to prevent context drift during multi-hour sessions8. Each subagent inherits the test harness but starts with a clean context window.
When to Use It
Superpowers excels when you need verification discipline. The chardet maintainer used Superpowers to rewrite chardet v7.0.0 from scratch, achieving a 41x performance improvement — the kind of aggressive optimisation that is only safe with comprehensive test coverage2.
# .codex/config.toml — Superpowers-compatible profile
[profile.superpowers]
model = "gpt-5.5"
approval_policy = "on-request"
sandbox_permissions = "workspace-write"
GSD (Get Shit Done): The Context Rot Killer
GSD, originally by TACHES, solves a different problem: context rot — the quality degradation that occurs as the context window fills3. Rather than fighting the limit, GSD embraces it by externalising all state into files and executing each task in a fresh context window.
Wave Execution Model
GSD groups plans into dependency-ordered “waves”9. Plans within a wave run in parallel; waves run sequentially. Each plan executes in a fresh 200K context window, receiving only the externalised state files (Markdown and XML) it needs3.
graph LR
subgraph "Wave 1 (parallel)"
P1["Plan: API schema"]
P2["Plan: DB migration"]
end
subgraph "Wave 2 (parallel)"
P3["Plan: Service layer"]
P4["Plan: Auth middleware"]
end
subgraph "Wave 3"
P5["Plan: Integration tests"]
end
P1 --> P3
P2 --> P3
P1 --> P4
P3 --> P5
P4 --> P5
Codex CLI Integration
GSD has a dedicated Codex CLI fork (get-shit-done-codex) that adapts the harness for Codex’s codex exec non-interactive mode and hooks system10. The meta-prompting layer generates fresh prompts for each plan execution, injecting only the relevant context files.
When to Use It
GSD targets complex projects spanning days or weeks — the kind where a single Codex CLI session would exhaust its context window multiple times1. If your project involves parallel workstreams and you find yourself running /clear frequently, GSD formalises that pattern.
gstack: The Role-Governed Organisation
Created by Garry Tan, gstack models a 23-person simulated team with explicit role governance4. A CEO agent makes strategic decisions, a product manager prioritises features, a QA lead enforces quality gates, engineers implement, designers review UX, and security reviewers audit. The framework enforces “boil the lake” principles: do fewer things excellently rather than mediocre work across everything1.
Role-Based Constraints
Unlike Superpowers (which constrains the process) or GSD (which constrains the environment), gstack constrains decision authority1. Each role has explicit boundaries:
- The CEO agent approves or rejects feature scope
- The QA Lead must sign off before any merge
- Engineers cannot bypass security review
- The Designer validates UI changes against design tokens
Codex CLI Configuration
gstack supports Codex CLI as one of seven compatible agents4. Configuration maps gstack roles to Codex subagents:
# .codex/agents/qa-lead.toml
[agent]
name = "qa-lead"
model = "gpt-5.5"
instructions = """
You are the QA Lead. Review all code changes for test coverage,
edge cases, and regression risk. Block merges that lack adequate
test verification. Run the full test suite before approving.
"""
When to Use It
gstack suits founder-engineers shipping a product where multi-perspective review and real browser testing matter more than infrastructure automation1. The role governance prevents the common failure mode where an AI agent optimises for one dimension (e.g., code quality) while neglecting others (e.g., UX, security).
Spec Kit: GitHub’s Constitutional Approach
Spec Kit, backed by GitHub with over 90,000 stars, takes a specification-first approach with constitutional guardrails5. A constitution.md file establishes non-negotiable project principles that the AI agent references during every phase11.
The Six-Phase Pipeline
Spec Kit defines six sequential phases, each triggered by a slash command511:
/speckit.constitution— establish project principles/speckit.specify— create requirements specification/speckit.plan— produce technical architecture/speckit.tasks— break down into actionable tasks/speckit.implement— execute tasks against spec/speckit.review— validate output against constitution
Codex CLI Setup
Spec Kit initialises for Codex CLI via the specify CLI5:
npm install -g @github/specify
specify init --here --ai codex
This creates the .specify/ directory structure with memory/constitution.md, templates, and agent configuration. The kiro-for-codex VS Code extension adds a visual sidebar for managing specs and tracking phase status without leaving the editor12.
When to Use It
Spec Kit excels when specifications need to be formal and auditable — regulated environments, contract work, or teams where requirements drift is a recurring problem. The constitutional layer provides a stronger governance mechanism than AGENTS.md alone.
OMX (Oh-My-Codex): The Team Runtime
OMX, created by Yeachan Heo, is the only framework in this comparison built specifically as a Codex CLI orchestration layer6. While the other five frameworks are agent-agnostic, OMX treats Codex as a first-class citizen with native hook ownership and a CLI-first team runtime.
tmux Worker Architecture
OMX spawns real tmux worker panes where each worker gets an isolated git worktree for conflict-free parallel work6. Workers spawn on demand and terminate when their task completes — no idle resource consumption.
# Launch OMX team runtime
omc team start --workers 3 --strategy fan-out
# Workers automatically get:
# - Isolated git worktrees
# - Independent Codex CLI sessions
# - Shared state via MCP servers
Persistent State
Unlike GSD (which externalises state to files) or Superpowers (which relies on fresh subagents), OMX uses persistent state and memory MCP servers6. Cross-session context survives worker termination, enabling long-running projects without the cold-start penalty.
When to Use It
OMX is purpose-built for multi-agent team orchestration on Codex CLI. If you need parallel Codex sessions coordinating on a shared codebase with automatic conflict resolution via worktrees, OMX is the only framework that handles this natively.
Compound Engineering: The Review Maximiser
The Compound Engineering plugin, from Every Inc (Ry Walker), focuses on review breadth7. Its standout feature is 14 specialised reviewers running simultaneously on every code change — architecture, security, performance, accessibility, testing, documentation, and more.
Key Workflows
Compound Engineering provides three primary commands7:
/workflows:plan— spawns three parallel research agents (repo analysis, framework docs, best practices) then merges results into a structured plan/workflows:work— executes the plan with continuous verification/workflows:review— launches 14 parallel reviewers for comprehensive feedback
Codex CLI Installation
The plugin installs through Codex’s TUI7:
codex
# In TUI:
/plugins
# Find Compound Engineering marketplace
# Select compound-engineering plugin → Install
# Restart Codex
When to Use It
Compound Engineering suits teams that value thoroughness over speed. The 14-reviewer pipeline catches issues that single-pass review misses, making it ideal for production systems where defect cost is high.
Decision Framework
Choosing a framework depends on your primary constraint:
graph TD
Q1{"What's your<br/>biggest pain?"}
Q1 -->|"Tests are missing<br/>or fragile"| A["Superpowers"]
Q1 -->|"Context degrades<br/>in long sessions"| B["GSD"]
Q1 -->|"Decisions lack<br/>diverse review"| C["gstack"]
Q1 -->|"Requirements drift<br/>from intent"| D["Spec Kit"]
Q1 -->|"Need parallel<br/>agents coordinating"| E["OMX"]
Q1 -->|"Reviews miss<br/>critical issues"| F["Compound Engineering"]
style A fill:#e1f5fe
style B fill:#f3e5f5
style C fill:#e8f5e9
style D fill:#fff3e0
style E fill:#fce4ec
style F fill:#f1f8e9
Combining Frameworks
These frameworks are not mutually exclusive. Common combinations include:
- Superpowers + OMX: TDD discipline across parallel workers
- Spec Kit + GSD: Constitutional specs with wave execution
- gstack + Compound Engineering: Role governance with deep review
The key constraint is context budget. Each framework adds system prompt overhead. Superpowers alone consumes roughly 15-20K tokens of instruction context; stacking two frameworks can push overhead past 40K tokens, which matters when GPT-5.5’s 400K window must also hold your codebase13.
The Meta-Pattern
Beneath the surface differences, all six frameworks converge on the same architectural pattern1:
Research → Plan → Execute → Review → Ship
What varies is where each framework places its guardrails. Superpowers guards execution (no code without tests). GSD guards context (fresh windows per plan). gstack guards authority (role-based approvals). Spec Kit guards intent (constitutional principles). OMX guards coordination (worker isolation). Compound Engineering guards quality (parallel review).
The framework you choose reveals what you fear most about your AI-assisted workflow — and that self-knowledge may be more valuable than the framework itself.
Citations
-
S. Raisshan, “codex-cli-best-practice: From vibe coding to agentic engineering,” GitHub, April 2026. https://github.com/shanraisshan/codex-cli-best-practice ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
J. Vincent (obra), “Superpowers: An agentic skills framework & software development methodology that works,” GitHub, 2026. https://github.com/obra/superpowers ↩ ↩2 ↩3
-
TACHES, “Get Shit Done: A light-weight and powerful meta-prompting, context engineering and spec-driven development system,” GitHub, 2026. https://github.com/gsd-build/get-shit-done ↩ ↩2 ↩3
-
G. Tan, “gstack: Role-governed agentic development framework,” GitHub, 2026. https://github.com/anthropics/gstack ↩ ↩2 ↩3
-
GitHub, “Spec Kit: Toolkit to help you get started with Spec-Driven Development,” GitHub, 2026. https://github.com/github/spec-kit ↩ ↩2 ↩3 ↩4
-
Y. Heo, “Oh-My-Codex (OMX): Orchestration layer for OpenAI Codex CLI,” GitHub, 2026. https://github.com/Yeachan-Heo/oh-my-codex ↩ ↩2 ↩3 ↩4
-
R. Walker / Every Inc, “Compound Engineering Plugin: Official plugin for Claude Code, Codex, Cursor, and more,” GitHub, 2026. https://github.com/EveryInc/compound-engineering-plugin ↩ ↩2 ↩3 ↩4
-
Pulumi Blog, “Superpowers, GSD, and GSTACK: Picking the Right Framework for Your Coding Agent,” April 2026. https://www.pulumi.com/blog/claude-code-orchestration-frameworks/ ↩ ↩2 ↩3
-
Agent Native, “GET SH*T DONE: Meta-prompting and Spec-driven Development for Claude Code and Codex,” Medium, February 2026. https://agentnativedev.medium.com/get-sh-t-done-meta-prompting-and-spec-driven-development-for-claude-code-and-codex-d1cde082e103 ↩
-
undeemed, “get-shit-done-codex: A light-weight and powerful meta-prompting system for Codex, originally by TACHES,” GitHub, 2026. https://github.com/undeemed/get-shit-done-codex ↩
-
Microsoft Developer Blog, “Diving Into Spec-Driven Development With GitHub Spec Kit,” April 2026. https://developer.microsoft.com/blog/spec-driven-development-spec-kit ↩ ↩2
-
atman-33, “kiro-for-codex: VS Code extension for spec-driven development with Codex CLI,” GitHub, 2026. https://github.com/atman-33/kiro-for-codex ↩
-
OpenAI, “Introducing GPT-5.5,” April 2026. https://openai.com/index/introducing-gpt-5-5/ ↩