Codex CLI for Terraform and Infrastructure as Code: The MCP Server, TerraShark, and Agent-Driven IaC Workflows
Codex CLI for Terraform and Infrastructure as Code: The MCP Server, TerraShark, and Agent-Driven IaC Workflows
Infrastructure as code shipped a decade ago as the answer to snowflake servers. In 2026, the question has shifted: who writes the IaC? For a growing number of platform engineering teams, the answer is a coding agent backed by structured skills and live registry data. This article maps the practical integration surface between Codex CLI and the Terraform ecosystem — from HashiCorp’s official MCP server through community skills that suppress hallucinations, to codex exec pipelines that audit drift in CI.
The Hallucination Problem with IaC
LLMs hallucinate Terraform resource types and attribute names at rates that would be merely annoying in application code but catastrophic in infrastructure definitions1. The TerraFormer paper (January 2026) documented that state-of-the-art models “struggle with Terraform synthesis, often hallucinating resource types and attribute names”2. Unguided output tends to fail in predictable ways: hardcoded values, monolithic files, permissive IAM policies, missing encryption, and no automated tests or policy checks3.
The root cause is straightforward. HCL is a domain-specific language with a comparatively small training corpus relative to Python or TypeScript. Provider schemas change with every release — the AWS provider alone exposes over 1,300 resource types — and an LLM trained six months ago will confidently generate attributes that no longer exist4.
Codex CLI cannot solve this alone. What it can do is wire together three layers that collectively ground the agent in current, validated data: the Terraform MCP server for live registry access, structured skills for failure-mode diagnosis, and sandbox-enforced validation pipelines.
Layer 1: The HashiCorp Terraform MCP Server
HashiCorp ships an official MCP server that gives any MCP-compatible client — including Codex CLI — real-time access to the Terraform Registry5. The server exposes tools across five categories:
| Category | Key Tools | Purpose |
|---|---|---|
| Registry | search_providers, get_provider_details, get_latest_provider_version |
Live provider documentation lookup |
| Modules | search_modules, get_module_details, get_latest_module_version |
Verified module discovery |
| Policy | search_policies, get_policy_details |
Sentinel governance policies |
| Workspace | list_workspaces, create_workspace, create_run |
HCP Terraform / TFE operations |
| Variables | list_variable_sets, create_workspace_variable |
Variable and tag management |
The server supports dual transports — stdio for local coding tools and Streamable HTTP for network deployments — with production security controls including CORS, TLS, and rate limiting6.
Wiring it into Codex CLI
Add the server to your config.toml:
[mcp_servers.terraform]
command = "npx"
args = ["-y", "@hashicorp/terraform-mcp-server"]
[mcp_servers.terraform.env]
# For HCP Terraform workspace operations (optional)
TFC_TOKEN = "env:TFC_TOKEN"
For teams using HCP Terraform or Terraform Enterprise, the workspace management tools unlock agent-driven operations: listing organisations and projects, creating workspaces with variables, triggering runs, and inspecting plan/apply output — all from within a Codex session7.
To scope tool access, use enabled_tools to expose only the registry tools in read-only profiles:
[mcp_servers.terraform]
command = "npx"
args = ["-y", "@hashicorp/terraform-mcp-server"]
enabled_tools = [
"search_providers",
"get_provider_details",
"get_latest_provider_version",
"search_modules",
"get_module_details",
"search_policies"
]
This prevents an agent in suggest mode from accidentally creating workspaces or triggering runs.
Note: The Terraform MCP server is currently in beta5. Test thoroughly before using workspace-mutation tools in production pipelines.
Layer 2: TerraShark — Failure-Mode-First IaC Skills
The Terraform MCP server solves the data freshness problem. It does not solve the reasoning problem — an agent can look up the correct attribute names and still produce a monolithic, insecure module with hardcoded credentials.
TerraShark is an open-source skill (MIT-licensed) designed specifically for the ways AI agents fail at infrastructure code8. Its core SKILL.md is a 79-line operational workflow costing approximately 600 tokens on activation — roughly 7x more token-efficient than competing skills9.
The Seven-Step Diagnostic Workflow
TerraShark enforces a structured sequence before generating any HCL:
flowchart TD
A[1. Capture Execution Context] --> B[2. Diagnose Failure Modes]
B --> C[3. Load Relevant References]
C --> D[4. Propose Fixes with Risk Controls]
D --> E[5. Generate Implementation Artefacts]
E --> F[6. Validate Before Finalising]
F --> G[7. Deliver Structured Output Contract]
Each response includes assumptions, remediation choices, tradeoffs, validation steps, and rollback notes8. The skill ships 18 focused reference files loaded on demand — covering state safety, migration playbooks, provider-specific guidance, and compliance mappings for ISO 27001, SOC 2, FedRAMP, GDPR, PCI DSS, and HIPAA9.
Installing TerraShark for Codex CLI
Clone the skill into your project and reference it from AGENTS.md:
git clone https://github.com/LukasNiessen/terrashark.git .terrashark
Then in your repository’s AGENTS.md:
## Infrastructure as Code
When working with Terraform or OpenTofu files:
- Follow the workflow defined in `.terrashark/SKILL.md`
- Always diagnose failure modes before generating HCL
- Load only the reference files relevant to identified risks
- Every module must include a validation step before finalising
The key insight is that TerraShark teaches the model how to think about infrastructure problems through diagnostic prompts rather than dumping reference material upfront9.
Layer 3: Pulumi Agent Skills
For teams using Pulumi rather than Terraform — or migrating between the two — Pulumi ships a dedicated Agent Skills package that follows the open Agent Skills specification10. Three skill groups are available:
- Migration Skills — Full Terraform-to-Pulumi migration workflow, including state translation, provider version alignment, and iterative
pulumi previewconvergence10 - Authoring Skills — Covers Pulumi ESC for centralised secrets, OIDC credential setup, environment composition, and program integration10
- Delegation Skills — Invocable via slash commands in Codex CLI sessions10
Install via the universal Agent Skills CLI:
npx skills add pulumi/agent-skills --skill terraform-migration -a codex
npx skills add pulumi/agent-skills --skill authoring -a codex
Once installed, skills activate automatically based on context. Ask the agent to help migrate a Terraform project and it draws on the migration skill’s workflow; debug resource recreation issues and the best practices skill checks for missing aliases10.
Practical Workflow: Agent-Driven Drift Detection in CI
Combining the MCP server, a structured AGENTS.md, and codex exec produces a drift detection pipeline that runs in CI without human intervention:
# .github/workflows/drift-check.yml
name: Terraform Drift Detection
on:
schedule:
- cron: '0 6 * * 1-5' # Weekday mornings
jobs:
drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: openai/codex-action@v1
with:
codex_api_key: ${{ secrets.OPENAI_API_KEY }}
approval_mode: full-auto
sandbox_mode: workspace-write
prompt: |
Run terraform init and terraform plan -detailed-exitcode
for each environment in environments/.
For any environment with drift (exit code 2):
1. Identify the drifted resources
2. Classify each as intentional (tagged manual-override)
or unintentional
3. For unintentional drift, generate a remediation PR
Output a JSON summary with environment, resource_address,
drift_type, and recommended_action for each finding.
output_schema: |
{
"type": "array",
"items": {
"type": "object",
"properties": {
"environment": {"type": "string"},
"resource_address": {"type": "string"},
"drift_type": {"type": "string"},
"recommended_action": {"type": "string"}
}
}
}
The structured --output-schema ensures machine-readable output regardless of model verbosity11. The codex-action drops sudo privileges by default and runs in an isolated sandbox, limiting blast radius12.
AGENTS.md Patterns for IaC Repositories
A well-structured AGENTS.md for an infrastructure repository differs materially from one written for application code. IaC guardrails must encode safety invariants that the agent cannot reason about from context alone:
# AGENTS.md — Infrastructure Repository
## Hard Rules
- NEVER use `danger-full-access` sandbox mode
- NEVER hardcode credentials, access keys, or tokens in .tf files
- NEVER create IAM policies with `"Action": "*"` or `"Resource": "*"`
- ALWAYS use variables for environment-specific values
- ALWAYS include `prevent_destroy = true` on stateful resources
## Terraform Conventions
- One module per logical service boundary
- Pin provider versions to minor: `~> 5.80`
- Pin module versions to exact: `= 3.2.1`
- Backend configuration lives in `backend.tf`, never inline
- Use `terraform validate` and `tflint` before committing
## Testing Requirements
- Run `terraform validate` on every modified module
- Run `terraform plan` in a throwaway workspace before applying
- Use `checkov` or `tfsec` for static security analysis
## MCP Server Usage
- Use `search_providers` to verify resource types before writing them
- Use `get_provider_details` to confirm attribute names
- Use `search_policies` to find relevant Sentinel policies
- Do NOT use `create_workspace` or `create_run` without explicit approval
Research shows that developer-written AGENTS.md files improve task success rates by approximately 4% and reduce agent-generated bugs by 35–55%13. For infrastructure code — where a single misplaced attribute can expose a database to the internet — that reduction in bugs translates directly to reduced blast radius.
Sandbox Configuration for IaC Workflows
Terraform workflows require careful sandbox tuning. The agent needs to execute terraform init, terraform validate, and terraform plan — each of which requires network access for provider downloads and state backend communication:
[profiles.iac]
approval_policy = "auto-edit"
sandbox_mode = "workspace-write"
[profiles.iac.sandbox_workspace_write]
network_access = true
writable_roots = [".terraform", "terraform.tfstate.d"]
[profiles.iac.features.network_proxy]
enabled = true
domains = [
"registry.terraform.io",
"releases.hashicorp.com",
"*.amazonaws.com",
"app.terraform.io"
]
The domain allowlist is critical. Without it, a compromised provider or malicious prompt injection could exfiltrate state files — which frequently contain secrets — to an attacker-controlled endpoint14. The writable_roots constraint prevents the agent from modifying files outside the Terraform working directory.
Model Selection for IaC Tasks
Not all models handle HCL equally well. GPT-5.5 leads on complex multi-provider configurations with its 400K context window, but for straightforward single-module generation, GPT-5.4 produces comparable results at lower cost15. Codex-Spark is unsuitable for IaC — the latency advantage is irrelevant for plan-heavy workflows, and its smaller context struggles with large state files.
For codex exec pipelines, specify the model explicitly:
codex exec --model gpt-5.5 --full-auto \
--prompt "Review the Terraform plan output in plan.txt and identify security risks"
The Token Efficiency Question
Pulumi’s engineering blog raises a valid point: general-purpose programming languages (TypeScript, Python, Go) are more token-efficient for LLMs than HCL because of richer training data and better model familiarity16. Teams starting greenfield infrastructure projects should consider whether Pulumi’s TypeScript/Python/Go SDKs produce more reliable agent output than Terraform’s HCL.
For existing Terraform estates, migration cost almost certainly outweighs the token efficiency gain. The practical answer is to layer grounding tools — the MCP server for live data, TerraShark for diagnostic reasoning, and AGENTS.md for hard guardrails — onto the existing HCL workflow.
What This Stack Cannot Do
Three limitations remain unresolved:
-
State file sensitivity — Terraform state frequently contains plaintext secrets. Running
terraform showinside an agent session exposes those secrets to the model’s context window and potentially to OpenAI’s API logs. Use remote state with encryption and avoid piping raw state into agent prompts. -
Apply-time side effects —
terraform applycreates real infrastructure with real cost. No amount of sandbox configuration makes an unreviewed apply safe. Keepterraform applybehind a human approval gate or a dedicated CI pipeline with plan review. -
Provider plugin trust —
terraform initdownloads and executes arbitrary provider binaries. The Codex sandbox constrains filesystem access but cannot inspect the behaviour of downloaded Go binaries. Pin provider versions and use a provider mirror for supply chain control. ⚠️
Citations
-
TerraShark GitHub repository — “LLMs hallucinate a lot with Terraform.” https://github.com/LukasNiessen/terrashark ↩
-
Ruan, Y. et al. (2026). “TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback.” arXiv:2601.08734. https://arxiv.org/html/2601.08734v1 ↩
-
Ramblings, SJ. (2026). “Is Infrastructure as Code Dying? AI Agents vs Terraform & HCL.” https://sjramblings.io/is-infrastructure-as-code-the-next-abstraction-to-fall/ ↩
-
Terraform Registry — AWS Provider. Provider exposes 1,300+ resource types across 300+ services. https://registry.terraform.io/providers/hashicorp/aws/latest ↩
-
HashiCorp. “Terraform MCP Server Overview.” HashiCorp Developer Documentation. https://developer.hashicorp.com/terraform/mcp-server ↩ ↩2
-
HashiCorp. “Deploy the Terraform MCP Server.” HashiCorp Developer Documentation. https://developer.hashicorp.com/terraform/mcp-server/deploy ↩
-
HashiCorp. “Terraform MCP Server Reference.” Full tool listing including workspace and run management. https://developer.hashicorp.com/terraform/mcp-server/reference ↩
-
Niessen, Lukas. “TerraShark: How I Fixed LLM Hallucinations in Terraform Without Burning All My Tokens.” Medium, 2026. https://lukasniessen.medium.com/terrashark-how-i-fixed-llm-hallucinations-in-terraform-without-burning-all-my-tokens-6c52a9910234 ↩ ↩2
-
TerraShark — Terraform Skill website. Feature comparison and token efficiency benchmarks. https://terraformskill.com/ ↩ ↩2 ↩3
-
Pulumi. “Agent Skills: Best Practices and More for AI Coding Assistants.” Pulumi Blog, 2026. https://www.pulumi.com/blog/pulumi-agent-skills/ ↩ ↩2 ↩3 ↩4 ↩5
-
OpenAI. “Codex CLI Reference — codex exec.” Structured output via
--output-schema. https://developers.openai.com/codex/cli/reference ↩ -
OpenAI. “Codex GitHub Action.” Security defaults including sudo removal. https://developers.openai.com/codex/github-action ↩
-
OpenAI. “Custom Instructions with AGENTS.md.” Developer-written files improve success rates by ~4% and reduce bugs 35–55%. https://developers.openai.com/codex/guides/agents-md ↩
-
OpenAI. “Agent Approvals & Security.” Sandbox and network access configuration. https://developers.openai.com/codex/agent-approvals-security ↩
-
OpenAI. “Codex Models.” GPT-5.5 capabilities and context window. https://developers.openai.com/codex/models ↩
-
Pulumi. “Token Efficiency vs Cognitive Efficiency: Choosing IaC for AI Agents.” Pulumi Blog, 2026. https://www.pulumi.com/blog/token-efficiency-vs-cognitive-efficiency-choosing-iac-for-ai-agents/ ↩