Codex CLI for Flutter and Dart Teams: MCP Server, DCM, and Agent-Driven Cross-Platform Development
Codex CLI for Flutter and Dart Teams: MCP Server, DCM, and Agent-Driven Cross-Platform Development
Flutter’s widget-based architecture, Dart’s strong type system, and the framework’s rapid feedback loop (hot reload, hot restart) make it unusually well-suited for agentic development. The Dart and Flutter MCP server — shipping with Dart 3.9+ — gives Codex CLI direct access to the analyser, formatter, test runner, pub.dev search, and runtime introspection, closing the gap between what the model generates and what actually runs on-device 1. This article covers the full setup for Flutter teams adopting Codex CLI, from MCP configuration through testing workflows to CI integration.
Why Flutter Is Agent-Friendly
Flutter shares several properties with frameworks that already perform well under AI coding agents:
- Deterministic compiler feedback.
dart analyzeanddart formatreturn structured, machine-parseable output on every save 2. - Hot reload as a verification loop. The agent can modify widget code and see the result reflected in a running app within seconds, matching the tight feedback cycles that reduce hallucination-driven drift 3.
- Convention-heavy structure. Flutter projects follow predictable directory layouts (
lib/,test/,pubspec.yaml), and widget composition follows a tree structure the model can reason about reliably 4. - First-party MCP support. The official
dart mcp-serverexposes analyser, formatter, test runner, and runtime introspection tools — no third-party wrappers required 1.
AGENTS.md Template for Flutter Projects
Place this at the repository root:
# AGENTS.md — Flutter/Dart Project
## Language & Framework
- Dart 3.11+ / Flutter 3.41+
- State management: Riverpod 3.x (prefer `@riverpod` code generation)
- Navigation: GoRouter
- Networking: dio + retrofit
## Architecture
- Feature-first directory structure: `lib/features/<name>/{data,domain,presentation}/`
- Domain layer uses freezed classes for immutable models
- Presentation layer uses ConsumerWidget / ConsumerStatefulWidget
## Conventions
- ALL widget classes in dedicated files — one public widget per file
- Use `const` constructors wherever possible
- Prefer composition over inheritance for widgets
- Named parameters for all widget constructors with >2 parameters
- Barrel files (`index.dart`) per feature, never at `lib/` root
## Testing
- Unit tests in `test/unit/`, widget tests in `test/widget/`, integration in `integration_test/`
- Run `flutter test` after every code change
- Widget tests MUST use `pumpWidget` with `MaterialApp` wrapper
- Golden tests for complex custom widgets — update with `--update-goldens` flag
## Code Quality
- Run `dart analyze` before committing — zero warnings policy
- Run `dart format .` — enforced line length 80
- DCM rules: no unused code, no unused files, cyclomatic complexity <10
## Do NOT
- Import `dart:mirrors` (not supported in AOT)
- Use `dynamic` types unless interfacing with raw JSON
- Add platform-specific code outside `lib/platform/`
- Modify generated files (`*.g.dart`, `*.freezed.dart`) manually
config.toml Setup
Sandbox Configuration
Flutter builds need network access for pub dependency resolution and device communication. Configure two profiles — a tight default and a build-capable variant:
[profiles.flutter]
model = "gpt-5.5"
approval_mode = "auto-edit"
[profiles.flutter.sandbox]
allow_network = true # pub get, device communication
allow_read = [
".",
"$HOME/.pub-cache",
"$HOME/.config/flutter",
"$FLUTTER_ROOT"
]
deny_read = [
".env",
"*.jks", # Android keystore
"ios/Runner/*.p12", # iOS signing
"android/key.properties"
]
[profiles.flutter-review]
model = "gpt-5.3-codex-spark"
approval_mode = "suggest"
Dart and Flutter MCP Server
[profiles.flutter.mcp_servers.dart]
command = "dart"
args = ["mcp-server", "--force-roots-fallback"]
The --force-roots-fallback flag is recommended for Codex CLI because it enables root management tools even when the client does not properly advertise roots support 1. Without it, the MCP server may not discover your project’s pubspec.yaml and analysis options correctly.
DCM MCP Server (Optional)
For teams using Dart Code Metrics, add a second MCP server:
[profiles.flutter.mcp_servers.dcm]
command = "dcm"
args = ["start-mcp-server"]
DCM provides code quality tools covering unused code detection, file-level metrics, cyclomatic complexity analysis, and widget structure auditing 5. The combination of the official Dart MCP server (for analysis, formatting, and testing) and DCM (for quality metrics) gives the agent comprehensive feedback without manual intervention.
The Agent-Driven Feature Development Workflow
sequenceDiagram
participant Dev as Developer
participant Codex as Codex CLI
participant Dart as Dart MCP Server
participant DCM as DCM MCP
participant App as Flutter App
Dev->>Codex: "Add user profile screen with avatar upload"
Codex->>Dart: Search pub.dev for image_picker, cached_network_image
Dart-->>Codex: Package metadata + versions
Codex->>Dart: Add packages to pubspec.yaml
Dart-->>Codex: Dependencies resolved
Codex->>Codex: Generate domain model (freezed)
Codex->>Codex: Generate repository + data source
Codex->>Codex: Generate ProfileScreen widget
Codex->>Dart: Run dart analyze
Dart-->>Codex: 0 issues
Codex->>Dart: Run dart format
Codex->>App: Hot restart
Codex->>DCM: Analyse code quality
DCM-->>Codex: Complexity OK, no unused code
Codex->>Codex: Generate widget tests
Codex->>Dart: Run flutter test
Dart-->>Codex: All tests pass
Codex-->>Dev: Feature complete, 6 files created
The key insight is that the Dart MCP server lets Codex search pub.dev and add packages programmatically rather than hallucinating package names or versions 1. This eliminates one of the most common failure modes in AI-generated Flutter code.
PostToolUse Hooks for Flutter
Hooks provide automatic quality gates after every file write:
[profiles.flutter.hooks.post_tool_use.analyse]
event = "post_tool_use"
tool = "apply_patch"
command = "dart analyze --fatal-infos lib/"
on_fail = "report_to_agent"
[profiles.flutter.hooks.post_tool_use.format]
event = "post_tool_use"
tool = "apply_patch"
command = "dart format --set-exit-if-changed lib/ test/"
on_fail = "report_to_agent"
[profiles.flutter.hooks.post_tool_use.build_runner]
event = "post_tool_use"
tool = "apply_patch"
command = "dart run build_runner build --delete-conflicting-outputs"
on_fail = "report_to_agent"
The build_runner hook is particularly important for Flutter projects using code generation (freezed, json_serializable, riverpod_generator). Without it, the agent may generate code that references *.g.dart or *.freezed.dart files that do not exist yet, causing cascade analysis failures 6.
Widget Testing Patterns
Widget tests in Flutter present a specific challenge for coding agents: they require a WidgetsApp or MaterialApp ancestor, proper pump cycles, and finder-based assertions. Encode these patterns in AGENTS.md and let the agent follow them consistently.
The Test Template Pattern
import 'package:flutter/material.dart';
import 'package:flutter_test/flutter_test.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
void main() {
testWidgets('ProfileScreen displays user name', (tester) async {
await tester.pumpWidget(
ProviderScope(
overrides: [
userProvider.overrideWithValue(
AsyncData(User(name: 'Alice', email: 'alice@example.com')),
),
],
child: const MaterialApp(home: ProfileScreen()),
),
);
await tester.pumpAndSettle();
expect(find.text('Alice'), findsOneWidget);
});
}
Add this to your AGENTS.md testing section:
## Widget Test Rules
- ALWAYS wrap test widgets in `ProviderScope` + `MaterialApp`
- ALWAYS call `pumpAndSettle()` after navigation or async operations
- Use `find.byType()` for structural assertions, `find.text()` for content
- Mock providers via `overrideWithValue`, never mock the widget tree itself
- Golden file tests: place reference images in `test/goldens/`
Integration Testing with Patrol
For end-to-end tests on real devices or simulators, Patrol (the successor to integration_test’s raw API) provides a cleaner interface 7. Combined with the iOS Simulator MCP server, Codex can orchestrate the full cycle:
[profiles.flutter.mcp_servers.ios_simulator]
command = "npx"
args = ["-y", "@anthropic/ios-simulator-mcp"]
This enables the agent to create simulators, capture screenshots for visual verification, and mock GPS locations for location-dependent features — all without leaving the Codex session.
Model Selection by Flutter Task
| Task | Recommended Model | Reasoning Effort | Rationale |
|---|---|---|---|
| Widget scaffolding | GPT-5.3-Codex-Spark | Low | Boilerplate-heavy, pattern-matching |
| State management refactor | GPT-5.5 | High | Cross-file reasoning, provider graph |
| Platform channel code | GPT-5.5 | High | Kotlin/Swift interop requires precision |
| Test generation | GPT-5.5 | Medium | Mock setup needs architectural awareness |
| Localisation (ARB files) | GPT-5.3-Codex-Spark | Low | Mechanical translation, structured format |
| Animation/CustomPainter | GPT-5.5 | High | Mathematical reasoning for curves/paths |
| pub.dev package evaluation | GPT-5.5 | Medium | Needs to weigh compatibility and maintenance |
Use Alt+, and Alt+. in the TUI to adjust reasoning effort mid-session as you move between task types 8.
The codex_cli_sdk Dart Package
For teams embedding Codex into Dart tooling (CLI tools, build scripts, or backend services), the codex_cli_sdk package on pub.dev provides a native Dart interface 9:
import 'package:codex_cli_sdk/codex_cli_sdk.dart';
final codex = Codex(apiKey: Platform.environment['OPENAI_API_KEY']!);
final chat = codex.createNewChat();
final result = await chat.sendMessage([
PromptContent.text('Generate a freezed model for UserProfile with '
'name, email, avatarUrl, and createdAt fields'),
]);
print(result.response);
await chat.dispose();
The SDK supports streaming responses, structured output via JSON schema, file attachments, and session resumption 9. However, note that this is a community package (unverified publisher) — evaluate it against your security requirements before using it in production pipelines. ⚠️
Headless CI Pipeline
# .github/workflows/codex-flutter-review.yml
name: Codex Flutter PR Review
on:
pull_request:
paths: ['lib/**', 'test/**', 'pubspec.yaml']
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: subosito/flutter-action@v2
with:
flutter-version: '3.41.5'
- run: flutter pub get
- uses: openai/codex-action@v1
with:
openai_api_key: $
codex-args: >-
--profile flutter-review
--approval-mode suggest
prompt: |
Review this PR for Flutter best practices:
1. Check widget composition (prefer small, focused widgets)
2. Verify const constructors are used where possible
3. Check for missing null safety patterns
4. Verify test coverage for new widgets
5. Flag any platform-specific code outside lib/platform/
Common Pitfalls
| Pitfall | Symptom | Mitigation |
|---|---|---|
Agent edits *.g.dart files |
Build runner overwrites changes | Add *.g.dart, *.freezed.dart to AGENTS.md “Do NOT modify” list |
Missing MaterialApp wrapper in tests |
No MediaQuery widget ancestor error |
Enforce wrapper pattern in AGENTS.md test rules |
| Agent adds packages not on pub.dev | pub get fails with 404 |
Dart MCP server’s pub.dev search prevents this 1 |
| Platform channel code hallucinated | Runtime MissingPluginException |
Require agent to check plugin registration in MainActivity.kt/AppDelegate.swift |
| Hot reload breaks state | Widget tree mismatch after major refactor | Use hot restart instead; add note to AGENTS.md |
dynamic type proliferation |
Lost type safety, runtime crashes | AGENTS.md rule: zero dynamic except JSON deserialization |
| Code generation not triggered | References to non-existent .g.dart |
PostToolUse hook runs build_runner after every patch 6 |