Codex CLI for Generating Architecture Diagrams from Source Code: Mermaid, C4, and PlantUML Visualisation Workflows
Codex CLI for Generating Architecture Diagrams from Source Code: Mermaid, C4, and PlantUML Visualisation Workflows
Architecture diagrams lie. Not because anyone deliberately drew them wrong, but because code moves faster than documentation. A team refactors a service boundary, adds a new queue, or renames an internal API — and the Mermaid diagram in the wiki quietly becomes fiction. By May 2026, the tooling exists to close that gap: Codex CLI can read your source code, infer architectural relationships, and emit diagram-as-code artefacts in Mermaid, PlantUML, or Structurizr DSL format — then keep them current through CI pipelines and codex exec automation12.
This article covers the end-to-end workflow: AGENTS.md conventions for diagram generation, interactive TUI sessions for exploratory architecture mapping, codex exec pipelines for automated diagram refresh, C4 model generation from repository analysis, and PostToolUse hooks for diagram validation.
Why Diagram-as-Code Matters for Agent Workflows
Three properties make text-based diagrams ideal for agent-driven generation:
- Git-diffable — Mermaid, PlantUML, and Structurizr DSL files are plain text, so architecture changes appear in pull request diffs alongside the code changes that caused them3.
- LLM-native syntax — models trained on millions of Markdown files have seen Mermaid syntax extensively; GPT-5.5 and GPT-5.4 generate syntactically valid Mermaid diagrams with high reliability4.
- Renderable in CI — the Mermaid CLI (
@mermaid-js/mermaid-cli), PlantUML JAR, and Kroki API all accept text input and produce SVG or PNG output without a browser5.
flowchart LR
A[Source Code] --> B[Codex CLI Analysis]
B --> C{Diagram Format}
C --> D[Mermaid .mmd]
C --> E[PlantUML .puml]
C --> F[Structurizr .dsl]
D --> G[Mermaid CLI]
E --> H[PlantUML JAR]
F --> I[Structurizr Lite]
G --> J[SVG / PNG]
H --> J
I --> J
J --> K[Documentation Site]
AGENTS.md Conventions for Diagram Generation
The single highest-leverage step is encoding diagram conventions in your AGENTS.md file. Without explicit constraints, the model will produce diagrams that are technically correct but stylistically inconsistent — mixing flowchart directions, using arbitrary node IDs, or omitting important subsystems6.
## Architecture Diagrams
When generating architecture diagrams:
- Use Mermaid for inline documentation (README, ADRs, PR descriptions)
- Use PlantUML or Structurizr DSL for formal architecture documentation in `/docs/architecture/`
- Always use `flowchart TD` (top-down) for system overviews, `sequenceDiagram` for API flows
- Node IDs must match service names from `docker-compose.yml` or Kubernetes manifests
- Include external dependencies (databases, queues, third-party APIs) as cylinder or cloud shapes
- Add `%%` comments linking each node to the source file that defines the service entry point
- Re-generate diagrams when files in `/src/services/`, `/infrastructure/`, or `/api/` change
This convention block gives the agent enough structure to produce consistent output while leaving room for the model to discover the actual architecture from code6.
Interactive Architecture Mapping
The most natural starting point is an interactive TUI session where you ask Codex CLI to analyse your codebase and produce a system-level diagram:
codex "Analyse the repository structure, identify all services and their
dependencies, then generate a Mermaid flowchart showing the system
architecture. Save it to docs/architecture/system-overview.mmd"
For larger codebases, decompose the task across diagram levels — mirroring the C4 model’s four-layer approach7:
| C4 Level | Codex CLI Prompt Pattern | Output Format |
|---|---|---|
| Context | “Map all external actors and systems this application interacts with” | Mermaid flowchart |
| Container | “Identify all deployable units (services, databases, queues) and their communication protocols” | Mermaid flowchart or PlantUML |
| Component | “For the order-service, map all internal modules and their dependencies” |
Mermaid classDiagram or flowchart |
| Code | “Generate a class diagram for the payment package showing public interfaces” |
Mermaid classDiagram |
Model Selection for Diagram Tasks
Diagram generation benefits from strong architectural reasoning. GPT-5.5 produces the most accurate system-level diagrams due to its superior planning and multi-step reasoning capabilities4. For component-level diagrams from a single service, GPT-5.4 or even Codex-Spark delivers adequate results at lower cost8.
# config.toml — model routing for diagram tasks
[model]
default = "gpt-5.4"
# Use GPT-5.5 for architecture-level analysis
# Switch with: /model gpt-5.5
Automated Diagram Generation with codex exec
Interactive sessions produce initial diagrams. Keeping them current requires automation. The codex exec non-interactive mode integrates diagram generation into CI/CD pipelines1:
Single-Diagram Generation
codex exec \
--sandbox workspace-write \
"Analyse the repository structure and regenerate
docs/architecture/system-overview.mmd as a Mermaid flowchart.
Include all services from src/services/ and their database
and message queue dependencies."
Structured Output with Schema Validation
For pipelines that need machine-readable metadata alongside diagrams, use --output-schema1:
{
"type": "object",
"properties": {
"diagram_path": { "type": "string" },
"format": { "enum": ["mermaid", "plantuml", "structurizr"] },
"services_detected": { "type": "integer" },
"external_dependencies": {
"type": "array",
"items": { "type": "string" }
},
"staleness_risk": {
"type": "string",
"enum": ["low", "medium", "high"]
}
},
"required": ["diagram_path", "format", "services_detected"]
}
codex exec \
--output-schema ./diagram-schema.json \
-o ./diagram-report.json \
"Regenerate the system architecture diagram and report metadata"
Multi-Diagram Batch Generation
For repositories with multiple services, use subagent fan-out to generate diagrams in parallel9:
# Generate a diagram for each service directory
for svc in src/services/*/; do
svc_name=$(basename "$svc")
codex exec \
--sandbox workspace-write \
"Generate a Mermaid component diagram for the ${svc_name} service
at ${svc}. Save to docs/architecture/components/${svc_name}.mmd" &
done
wait
C4 Model Generation with Skills and MCP
The LikeC4 Agent Skill
The community likec4 skill by @schup provides a structured workflow for generating interactive C4 architecture diagrams from source code analysis10. It outputs Structurizr DSL that can be rendered locally:
# Install the skill
codex install skill schup/likec4
# Generate C4 diagrams
codex "Use the likec4 skill to generate a complete C4 model
for this repository, including context, container,
and component diagrams"
C4-PlantUML with Rendering
For teams that prefer PlantUML’s richer UML vocabulary, the C4-PlantUML library provides C4-specific macros11:
@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
Person(user, "Developer", "Writes and reviews code")
System_Boundary(codex_system, "Codex CLI") {
Container(cli, "CLI Binary", "Rust", "Terminal interface")
Container(agent_loop, "Agent Loop", "Rust", "Orchestrates tool calls")
Container(sandbox, "Sandbox", "Landlock/Seatbelt", "Isolates file system access")
}
System_Ext(openai_api, "OpenAI API", "Responses API endpoint")
Rel(user, cli, "Runs prompts")
Rel(cli, agent_loop, "Dispatches tasks")
Rel(agent_loop, sandbox, "Executes tools within")
Rel(agent_loop, openai_api, "Sends requests")
@enduml
The agent can generate this syntax by analysing import graphs, service registrations, and infrastructure-as-code files. The key AGENTS.md constraint is specifying which C4 level to target — without it, the model defaults to context-level diagrams that lack actionable detail7.
Structurizr DSL for Model-Based Consistency
Structurizr DSL enforces the C4 model’s rules at the syntax level — you cannot create a component diagram without first defining the parent container12. This makes it ideal for agent-generated diagrams because syntax errors immediately signal structural mistakes:
workspace {
model {
user = person "Developer"
codex = softwareSystem "Codex CLI" {
cli = container "CLI Binary" "Rust" "Terminal interface"
agentLoop = container "Agent Loop" "Rust" "Orchestrates tool calls"
sandbox = container "Sandbox" "Landlock/Seatbelt" "Isolates FS access"
}
openai = softwareSystem "OpenAI API" "Responses API"
user -> cli "Runs prompts"
cli -> agentLoop "Dispatches tasks"
agentLoop -> sandbox "Executes tools within"
agentLoop -> openai "Sends requests"
}
views {
container codex {
include *
autolayout lr
}
}
}
The Structurizr MCP server (available on SkillsLLM) exposes workspace management tools that let the agent query and update architecture models programmatically13.
PostToolUse Hooks for Diagram Validation
Generated diagrams can contain syntax errors that render correctly in some tools but fail in others. A PostToolUse hook catches these before they reach version control14:
# config.toml
[[hooks]]
event = "PostToolUse"
type = "command"
command = """
if echo "$CODEX_TOOL_ARGS" | grep -q '\.mmd"'; then
FILE=$(echo "$CODEX_TOOL_ARGS" | grep -oP '[^"]*\.mmd')
if [ -f "$FILE" ]; then
npx @mermaid-js/mermaid-cli -i "$FILE" -o /dev/null 2>&1 || \
echo '{"status":"deny","reason":"Mermaid syntax error in '$FILE'"}'
fi
fi
"""
This hook intercepts any file write that produces a .mmd file and validates it through the Mermaid CLI. If the syntax is invalid, the tool call is denied and the agent receives feedback to fix the diagram14.
For PlantUML validation:
java -jar plantuml.jar -checkonly "$FILE" 2>&1
CI/CD Integration: Diagrams That Update Themselves
The most powerful pattern combines codex exec with CI triggers to regenerate diagrams whenever architectural code changes25:
# .github/workflows/update-diagrams.yml
name: Update Architecture Diagrams
on:
push:
paths:
- 'src/services/**'
- 'infrastructure/**'
- 'docker-compose.yml'
- 'k8s/**'
jobs:
regenerate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: openai/setup-codex@v1
- name: Regenerate system diagram
run: |
codex exec \
--sandbox workspace-write \
--ignore-user-config \
"Regenerate docs/architecture/system-overview.mmd based on
the current service structure. Preserve existing node
styling and comments."
env:
OPENAI_API_KEY: $
- name: Render to SVG
run: npx @mermaid-js/mermaid-cli -i docs/architecture/system-overview.mmd -o docs/architecture/system-overview.svg
- name: Commit if changed
run: |
git diff --quiet docs/architecture/ || \
(git add docs/architecture/ && \
git commit -m "docs: regenerate architecture diagrams" && \
git push)
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant CI as CI Runner
participant Codex as codex exec
participant Render as Mermaid CLI
Dev->>GH: Push code change to src/services/
GH->>CI: Trigger workflow
CI->>Codex: Regenerate .mmd from source
Codex->>CI: Updated diagram file
CI->>Render: Convert .mmd to .svg
Render->>CI: SVG output
CI->>GH: Commit updated diagrams
Sequence Diagrams from API Contracts
One particularly effective pattern is generating sequence diagrams from OpenAPI specifications or gRPC protobuf definitions15:
codex exec \
--sandbox workspace-write \
"Read the OpenAPI spec at api/openapi.yaml and generate Mermaid
sequence diagrams for every endpoint that involves more than
two services. Save each diagram to docs/architecture/sequences/"
The model traces request flows through API gateway definitions, service-to-service calls defined in the spec, and database interactions inferred from response schemas. The output captures the actual documented contract rather than guessing from implementation code.
Practical Recommendations
-
Start with Mermaid — it renders natively in GitHub, GitLab, and most documentation tools without extra infrastructure. Graduate to Structurizr DSL only when you need model-level consistency enforcement across multiple C4 levels312.
-
Pin diagram scope in AGENTS.md — without constraints, the agent will produce sprawling diagrams that try to capture everything. Specify which directories map to which C4 levels and which external systems to include.
-
Use
codex execfor refresh, TUI for discovery — interactive sessions are ideal for initial architecture mapping of an unfamiliar codebase. Once the diagram structure stabilises, switch to automatedcodex execpipelines for maintenance. -
Validate before committing — PostToolUse hooks or CI-stage validation prevents syntactically broken diagrams from reaching the documentation site.
-
Version diagrams alongside code — store
.mmd,.puml, or.dslfiles in the same repository as the code they describe. This makes architectural drift visible in code review. -
Model routing matters — use GPT-5.5 for system-context and container-level diagrams where architectural reasoning is critical. Use GPT-5.4 or Codex-Spark for component-level and class diagrams where the scope is smaller48.
Current Limitations
- No runtime analysis — Codex CLI analyses static source code and configuration files. It cannot observe actual runtime communication patterns, message flows, or database query patterns. Diagrams reflect the designed architecture, not necessarily the deployed one.
- Large codebase context limits — for repositories exceeding the context window, the agent must analyse services individually and compose the overall diagram from partial views. The
codex exec resumepattern helps but adds complexity1. - Mermaid rendering inconsistencies — Mermaid syntax that renders correctly on GitHub may fail in the CLI renderer or vice versa. The PostToolUse validation hook mitigates this but does not eliminate it.
- C4 level ambiguity — without explicit AGENTS.md guidance, the model often conflates container and component levels, producing diagrams that mix abstraction layers7.
Citations
-
OpenAI, “Non-interactive mode — Codex,” May 2026. https://developers.openai.com/codex/noninteractive ↩ ↩2 ↩3 ↩4
-
Cosmo Edge, “Automate Technical Diagrams with LLMs using Mermaid, PlantUML and CI/CD,” 2026. https://cosmo-edge.com/automate-technical-diagrams-llm-mermaid-plantuml-cicd/ ↩ ↩2
-
Mermaid, “Mermaid — Diagramming and charting tool,” 2026. https://mermaid.js.org/ ↩ ↩2
-
OpenAI, “Models — Codex,” May 2026. https://developers.openai.com/codex/models ↩ ↩2 ↩3
-
Kroki, “Kroki — Creates diagrams from textual descriptions,” 2026. https://kroki.io/ ↩ ↩2
-
OpenAI, “AGENTS.md — Codex,” May 2026. https://developers.openai.com/codex/agents-md ↩ ↩2
-
Simon Brown, “The C4 model for visualising software architecture,” 2026. https://c4model.com/ ↩ ↩2 ↩3
-
OpenAI, “Codex Changelog — Codex-Spark research preview,” May 2026. https://developers.openai.com/codex/changelog ↩ ↩2
-
OpenAI, “Subagents — Codex,” May 2026. https://developers.openai.com/codex/subagents ↩
-
SkillsMP, “likec4 — Agent Skill by schup,” 2026. https://skillsmp.com/skills/schup-likec4-skill-skill-md ↩
-
GitHub, “C4-PlantUML — PlantUML C4 Model macros,” 2026. https://github.com/plantuml-stdlib/C4-PlantUML ↩
-
Structurizr, “DSL — Structurizr,” 2026. https://docs.structurizr.com/dsl ↩ ↩2
-
SkillsLLM, “Structurizr MCP Server,” 2026. https://skillsllm.com/skill/structurizr ↩
-
OpenAI, “Hooks — Codex,” May 2026. https://developers.openai.com/codex/hooks ↩ ↩2
-
OpenAI, “Best practices — Codex,” May 2026. https://developers.openai.com/codex/learn/best-practices ↩