Codex CLI with Datadog and New Relic: Vendor-Specific Observability for Agent Pipelines

The open-standards observability story for Codex CLI — OpenTelemetry traces, Prometheus metrics, Grafana dashboards — is well documented. But most enterprise teams do not run vanilla OTel backends. They run Datadog or New Relic, with years of investment in dashboards, alert policies, and on-call routing. This article bridges the gap: wiring Codex CLI into vendor-specific observability platforms via their MCP servers so that agent pipelines get the same incident response, cost attribution, and SLA monitoring that production services already enjoy.

Both vendors shipped remote MCP servers in early 2026 — Datadog on 9 March ¹ and New Relic on 24 February ² — and both work with Codex CLI’s Streamable HTTP transport. The result is bidirectional: Codex can query live telemetry to inform its coding decisions, and your existing observability stack can monitor the agent itself.

Architecture Overview

graph LR
    subgraph "Developer Workstation"
        C[Codex CLI]
    end
    subgraph "Remote MCP Servers"
        DD[Datadog MCP<br/>mcp.datadoghq.com]
        NR[New Relic MCP<br/>mcp.newrelic.com]
    end
    subgraph "Observability Backends"
        DDB[(Datadog Platform)]
        NRB[(New Relic Platform)]
    end
    C -->|Streamable HTTP| DD
    C -->|Streamable HTTP| NR
    DD --> DDB
    NR --> NRB

Both servers are remote-hosted — no local binary to install, no Docker container to run. Authentication flows through OAuth 2.0 (Datadog) or API key headers (New Relic), both configurable in config.toml.

Datadog MCP Server Configuration

Datadog’s MCP server exposes 100+ tools across 16 toolsets, from core log and metric queries through APM trace analysis, database monitoring, Kubernetes resource inspection, CI pipeline analytics, and even code execution in a managed sandbox ³.

config.toml

[mcp_servers.datadog]
url = "https://mcp.datadoghq.com/mcp?toolsets=core,apm,alerting,software-delivery"
startup_timeout_sec = 15
tool_timeout_sec = 60
enabled = true

Authenticate via the OAuth flow:

codex mcp login datadog

This opens a browser to complete Datadog’s OAuth handshake. Codex stores the resulting credentials until the token expires ⁴. For CI environments where browser login is impractical, use API key authentication instead:

[mcp_servers.datadog]
url = "https://mcp.datadoghq.com/mcp?toolsets=core,apm"
env_http_headers = { "DD-API-KEY" = "DD_API_KEY", "DD-APPLICATION-KEY" = "DD_APP_KEY" }

Toolset Selection

The ?toolsets= query parameter controls which tool categories load. Loading everything (?toolsets=all) floods the context window with 100+ tool descriptions. A pragmatic default for agent development:

Toolset	Key Tools	When to Enable
`core`	`search_datadog_logs`, `get_datadog_metric`, `search_datadog_monitors`	Always
`apm`	`apm_explore_trace`, `apm_latency_bottleneck_analysis`, `apm_search_watchdog_stories`	Microservices debugging
`alerting`	`validate_datadog_monitor`, `get_monitor_coverage`, `create_datadog_monitor`	Monitor-as-code workflows
`software-delivery`	`search_datadog_ci_pipeline_events`, `get_datadog_flaky_tests`, `aggregate_dora_deployments`	CI/CD pipeline analysis
`security`	`datadog_secrets_scan`, `search_datadog_security_signals`	Security-sensitive repos

Rate Limits

Datadog enforces 50 requests per 10-second burst window and 50,000 monthly tool calls ³. For batch workflows using codex exec, this means roughly 1,600 tool calls per day — plan accordingly.

New Relic MCP Server Configuration

New Relic’s MCP server provides 35+ tools with a standout feature: natural language to NRQL conversion ⁵. Where Datadog requires you to know its query syntax, New Relic’s natural_language_to_nrql_query tool lets the agent describe what it wants in plain English and receive executable NRQL.

config.toml

[mcp_servers.newrelic]
url = "https://mcp.newrelic.com/mcp/"
env_http_headers = { "Api-Key" = "NEW_RELIC_API_KEY" }
startup_timeout_sec = 15
tool_timeout_sec = 60
enabled = true

New Relic uses API key authentication (NRAK-prefixed user keys) rather than OAuth ⁵. For EU-region accounts, swap the URL to https://mcp.eu.newrelic.com/mcp/ ⁶.

Key Tool Categories

New Relic organises its 35 tools across six categories ⁵:

Discovery: get_entity, search_entity_with_tag, list_related_entities — map your service topology
Data Access: execute_nrql_query, natural_language_to_nrql_query, list_recent_logs, query_logs — the query engine
Alerting: list_alert_policies, list_alert_conditions — audit alert coverage
Incident Response: search_incident, list_change_events, analyze_deployment_impact — post-deploy verification
Performance: analyze_golden_metrics, analyze_transactions, analyze_kafka_metrics, analyze_threads — deep diagnostics
Dashboards: get_dashboard, list_dashboards — read existing visualisations

Composing Both Servers

There is no reason to choose one. Many organisations run Datadog for infrastructure and APM whilst using New Relic for application-level analytics and synthetic monitoring. Codex CLI supports multiple MCP servers concurrently:

[mcp_servers.datadog]
url = "https://mcp.datadoghq.com/mcp?toolsets=core,apm"
startup_timeout_sec = 15
tool_timeout_sec = 60

[mcp_servers.newrelic]
url = "https://mcp.newrelic.com/mcp/"
env_http_headers = { "Api-Key" = "NEW_RELIC_API_KEY" }
startup_timeout_sec = 15
tool_timeout_sec = 60

When both servers are active, the agent can cross-reference signals — for example, correlating a Datadog APM trace with New Relic error groups to build a complete incident picture.

Workflow Patterns

1. Incident-Driven Debugging

The most immediate use case: point Codex at a production incident and let it pull live telemetry.

codex "Service checkout-api has elevated error rates.
Use Datadog to search recent logs for errors in checkout-api,
then check New Relic golden metrics for the last hour.
Identify the root cause and suggest a fix."

The agent calls search_datadog_logs with service and error-level filters, then analyze_golden_metrics on the New Relic side to compare throughput and response time trends. With both signal sources, it can distinguish between a code regression (error rate spike without throughput change) and a load issue (throughput spike preceding errors).

2. Post-Deployment Verification

After a codex exec batch run that modifies multiple services, verify the deployment’s observability impact:

codex "I just deployed commit abc123 to the payments service.
Use New Relic analyze_deployment_impact to check for regressions.
Then use Datadog to search for any new monitor alerts in the last 15 minutes.
If anything looks wrong, draft a rollback plan."

3. Monitor-as-Code with Validation

Datadog’s validate_datadog_monitor tool lets Codex check monitor definitions before applying them, and get_monitor_coverage identifies gaps:

codex "Check monitor coverage for the order-service in Datadog.
For any gaps, generate monitor definitions in Terraform HCL format
using the Datadog provider. Validate each monitor before writing."

4. CI Pipeline Investigation

When a CI pipeline fails intermittently, the software-delivery toolset provides deep visibility:

codex "Use Datadog to find flaky tests in the main branch CI pipeline
for the last 7 days. For the top 3 flakiest tests,
search for related error tracking issues and suggest fixes."

AGENTS.md Addendum

For projects using vendor observability, add context to your AGENTS.md:

## Observability

- **Primary APM**: Datadog — all services emit traces via dd-trace-py/dd-trace-js
- **Synthetic monitoring**: New Relic — synthetic monitors cover all public endpoints
- **Alert routing**: Datadog monitors → PagerDuty → #incidents Slack channel
- **Key dashboards**: "Service Overview" (Datadog), "User Impact" (New Relic)

When investigating production issues:
1. Check Datadog monitors and recent alerts first
2. Use New Relic golden metrics for baseline comparison
3. Cross-reference Datadog APM traces with New Relic error groups
4. Never create or modify monitors without validating first

Security Considerations

Token scope: Datadog’s OAuth flow requests mcp_read and mcp_write permissions ⁴. For read-only agent workflows, request only mcp_read. New Relic API keys should use the minimum required role — User rather than Admin where possible.

Credential storage: Both DD_API_KEY / DD_APP_KEY and NEW_RELIC_API_KEY must live in environment variables, not in config.toml directly. Use env_http_headers to reference them safely.

Approval gating: For tools that modify state (Datadog’s create_datadog_monitor, execute_datadog_workflow), set explicit approval:

[mcp_servers.datadog]
url = "https://mcp.datadoghq.com/mcp?toolsets=core,apm,alerting"
approval_mode = "approve"
enabled_tools = ["search_datadog_logs", "get_datadog_metric", "search_datadog_monitors", "validate_datadog_monitor"]

Network access: Both servers require outbound HTTPS from the sandbox. Codex CLI’s full-auto mode enables network access by default; in suggest or auto-edit modes, ensure your sandbox policy permits connections to mcp.datadoghq.com and mcp.newrelic.com.

Model Selection

Both servers return structured data that benefits from strong reasoning. For incident investigation workflows involving multiple tool calls and cross-referencing:

o3: Best for complex multi-step investigations requiring synthesis across both platforms
o4-mini: Suitable for single-vendor queries and straightforward log searches
gpt-5.5: ⚠️ Strong reasoning but higher latency and cost; reserve for critical incidents

Limitations

Context budget: Loading both servers simultaneously consumes significant context with tool descriptions. Use enabled_tools to whitelist only the tools your workflows need ⁷.
Rate limits: Datadog’s 50-request/10-second burst limit can throttle aggressive batch workflows. New Relic’s rate limits are account-tier dependent ⁵.
Write operations: Both servers support creating monitors, dashboards, and alerts, but Codex’s sandbox does not roll back vendor-side changes. Gate writes behind approval_mode = "approve".
Training data lag: Neither o3 nor o4-mini have training data covering the March 2026 MCP server launches. The agent relies entirely on tool descriptions for correct usage — which generally works well but can produce suboptimal queries without AGENTS.md guidance.
No webhook integration: Neither MCP server supports push-based notifications. The agent must poll for new incidents rather than receiving real-time alerts.
GovCloud: Datadog’s MCP server is not available on GovCloud sites (ddog-gov.com) ³.

Citations

Datadog Launches MCP Server — Press Release, Datadog Investor Relations, March 2026 ↩
New Relic Launches Agentic AI Monitoring and MCP Server, BigDATAwire, February 2026 ↩
Datadog MCP Server Documentation, Datadog Docs, 2026 ↩ ↩² ↩³
Set Up the Datadog MCP Server, Datadog Docs, 2026 ↩ ↩²
New Relic MCP Server Review — 35 Tools, Free Tier, ChatForest, 2026 ↩ ↩² ↩³ ↩⁴
Set up New Relic MCP, New Relic Documentation, 2026 ↩
Model Context Protocol — Codex CLI, OpenAI Developers, 2026 ↩