Memory Lifecycle Management: Create, Consolidate, Clean, Delete in Codex CLI

Memory Lifecycle Management: Create, Consolidate, Clean, Delete in Codex CLI


Since v0.100.0, Codex CLI has shipped a persistent memory system that retains facts, preferences, and project context across sessions1. What started as a simple key–value note store has matured into a two-phase extraction-and-consolidation pipeline backed by SQLite, with configurable retention, diff-based forgetting, and programmatic deletion. This article walks through every stage of that lifecycle — create, consolidate, clean, delete — and explores the configuration surface and data-governance implications for enterprise teams.

Why Agent Memory Matters

Stateless agents force developers to repeat context: “use pnpm, not npm”; “our API always returns a meta field”; “run tests with pytest -x --tb=short”. Codex CLI’s memory system eliminates that friction by persisting observations from past sessions and injecting them into the model’s developer instructions at startup2. The payoff compounds across long-running projects where architectural decisions, tool preferences, and codebase conventions accumulate over weeks.

Architecture at a Glance

flowchart LR
    A[Session Rollouts<br/>.jsonl files] -->|Startup scan| B[Phase 1<br/>Extraction]
    B -->|StageOneOutput| C[(SQLite<br/>state DB)]
    C -->|Incremental diff| D[Phase 2<br/>Consolidation]
    D --> E[memory_summary.md]
    D --> F[MEMORY.md]
    D --> G[rollout_summaries/]
    D --> H[skills/]
    E -->|Injected at startup| I[New Session<br/>Context Window]

The pipeline comprises two asynchronous phases that run at session startup, a set of on-disk artefacts under ~/.codex/memory/, and a SQLite state database that tracks job ownership and watermarks3.

Phase 1 — Extraction

Phase 1 identifies stale threads — those updated since their last memory extraction — and summarises each one using the lightweight gpt-5.4-mini model3. The output is a StageOneOutput struct containing:

Field Purpose
raw_memory Detailed markdown capturing decisions, preferences, and lessons
rollout_summary Compact recap suitable for the rollout_summaries/ directory
rollout_slug Optional human-readable filename stem

Results land in the stage1_outputs table in SQLite. Concurrency is governed by CONCURRENCY_LIMIT (default 8), ensuring the startup scan does not saturate API quotas3.

Selection Criteria

Not every thread qualifies. Phase 1 scans the threads table for active threads within max_rollout_age_days and filters out those idle for less than min_rollout_idle_hours (recommended >12 h)4. This prevents in-progress sessions from generating premature memories.

Phase 2 — Global Consolidation

Phase 2 is the heavy lift. It claims a global lock via try_claim_global_phase2_job (with a one-hour lease expiry) and spawns a dedicated sub-agent sourced as SubAgentSource::MemoryConsolidation3. The consolidation model is gpt-5.3-codex running at medium reasoning effort — stronger than Phase 1’s mini model, because merging and deduplicating memories requires judgement5.

Incremental Diff Labels

Rather than reprocessing every memory from scratch, Phase 2 tracks an input_watermark and classifies each Stage 1 output with a diff label5:

Label Meaning
added In the current selection but absent from the previous Phase 2 baseline
retained Present in both current selection and prior snapshot, unchanged
removed Was in the prior baseline but no longer in the current top-N

The consolidation agent receives these labels and adjusts the on-disk artefacts accordingly — merging new facts, preserving stable ones, and forgetting removed evidence.

On-Disk Artefacts

Phase 2 maintains five artefact types under codex_home:

~/.codex/memory/
├── memory_summary.md          # Navigational summary injected into prompts
├── MEMORY.md                  # Searchable registry of aggregated insights
├── raw_memories.md            # Temporary merge of Phase 1 outputs (input to Phase 2)
├── rollout_summaries/
│   └── <thread_id>-<ts>-<slug>.md
└── skills/
    └── <name>/SKILL.md        # Reusable procedures and scripts

Rollout summary filenames are generated by rollout_summary_file_stem, combining ThreadId, a timestamp fragment, and the optional rollout_slug3.

Context Injection — The Read Path

At session start, memory_summary.md is injected into the model’s developer instructions, truncated to MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT (5,000 tokens)3. The agent is instructed to cite specific files and line ranges using <oai-mem-citation> blocks, creating a retrieval-augmented generation loop where the model can reference evidence from past sessions.

Each citation triggers a call to record_stage1_output_usage, incrementing usage_count and updating last_usage in the database3. This usage tracking feeds directly into the selection ranking for the next consolidation pass:

usage_count DESC → last_usage DESC → source_updated_at DESC

Frequently cited memories rise to the top; neglected ones gradually age out5.

Cleaning — Diff-Based Forgetting

Introduced in v0.106.0, diff-based forgetting is the mechanism that keeps the memory store lean1. During Phase 2, memories labelled removed trigger deletion of the corresponding evidence from rollout_summaries/ and MEMORY.md. The consolidation agent produces a forgetting pass that “deletes only the evidence supported by removed thread IDs”6.

Stale Output Pruning

Phase 1 also performs housekeeping: stage1_outputs older than max_unused_days are pruned in batches of PRUNE_BATCH_SIZE3. This prevents the SQLite database from growing unboundedly on machines with years of session history.

Polluted Thread Handling

When a session uses a disqualifying tool (one that produces non-deterministic or sensitive output), the thread can be marked as polluted. If the thread had selected_for_phase2 = 1, the system immediately enqueues a new global Phase 2 job so the forgetting pass can remove it6.

Deletion — User-Controlled and Programmatic

Interactive Deletion

The /m_drop <query> slash command removes a memory matching the query from the active store1. This is the simplest deletion path — useful for correcting a stale preference or removing an outdated architectural fact.

Full Reset

For a clean slate — when switching between unrelated projects, for instance — codex debug clear-memories wipes all memory artefacts and resets the SQLite state1. Introduced in v0.107.0, this addresses the common complaint that memories from Project A would bleed into Project B.

Disabling the Read Path

Three methods exist to suppress memory injection without deleting the underlying data7:

# Command line flag
codex --no-project-doc

# Environment variable
export CODEX_DISABLE_PROJECT_DOC=1
# config.toml
[project_docs]
enabled = false

For the memories pipeline itself, the memories.use_memories and memories.generate_memories booleans in config.toml control read and write paths independently4:

[memories]
use_memories = true        # Inject memories into prompts
generate_memories = true   # Run Phase 1/2 pipeline at startup

Setting generate_memories = false while keeping use_memories = true freezes the memory store — the agent still reads existing memories but stops creating new ones.

Configuration Reference

The full [memories] section in ~/.codex/config.toml exposes nine tunables4:

[memories]
use_memories = true
generate_memories = true
max_rollout_age_days = 90
min_rollout_idle_hours = 12
max_rollouts_per_startup = 5000
max_unused_days = 60
max_raw_memories_for_global = 200
phase_1_model = "gpt-5.4-mini"
phase_2_model = "gpt-5.3-codex"
Key Default Purpose
max_rollout_age_days 90 Threads older than this are excluded from Phase 1
min_rollout_idle_hours 12 Minimum idle time before a thread qualifies
max_rollouts_per_startup 5000 Cap on Phase 1 candidates per session start
max_unused_days 60 Stale outputs pruned after this many days
max_raw_memories_for_global 200 Top-N memories fed into Phase 2 consolidation
phase_1_model gpt-5.4-mini Lightweight extraction model
phase_2_model gpt-5.3-codex Consolidation model with medium reasoning

Enterprise teams will want to tune max_unused_days and max_raw_memories_for_global to balance recall depth against startup latency.

Memory vs AGENTS.md — When to Use Which

Codex CLI offers two persistence mechanisms, and conflating them is a common mistake:

Dimension Memories AGENTS.md
Scope Personal, per-user Project or directory, version-controlled
Creation Automatic from sessions Manual, authored by developers
Sharing Not shared Committed to the repository
Hierarchy Flat (single user store) Three-level merge: global → project → directory7
Best for Personal preferences, tool choices Team conventions, architecture docs

Use memories for “always use pytest” and AGENTS.md for “this repo’s API layer lives in src/api/ and uses FastAPI”.

Enterprise Data Governance Considerations

For organisations deploying Codex CLI at scale, the memory system raises several governance questions:

  1. Data residency: Memories are stored locally under codex_home. In cloud-task mode, they reside on the ephemeral runner — ensure your task infrastructure handles cleanup.

  2. Secret leakage: Since v0.101.0, automatic secret sanitisation prevents credentials from being written to memory files1. However, teams should audit raw_memories.md periodically, especially in regulated environments.

  3. Retention policies: Map max_unused_days and max_rollout_age_days to your organisation’s data retention requirements. A 60-day default may be too long — or too short — depending on compliance posture.

  4. Cross-project contamination: Without per-project memory partitioning, memories from one codebase can influence another. Use codex debug clear-memories when rotating between sensitive projects, or set generate_memories = false for short-lived tasks.

Comparison with Claude Code

Claude Code takes a fundamentally different approach: it has no built-in persistent memory system equivalent to Codex’s two-phase pipeline5. Instead, Claude Code relies on CLAUDE.md instruction files (analogous to AGENTS.md) and a three-tier context compaction strategy (tool result trimming → cache-friendly strategies → structured summaries)8. The absence of automatic cross-session memory means Claude Code users must manually curate their instruction files — more control, less automation.

Conclusion

Codex CLI’s memory lifecycle — create via Phase 1 extraction, consolidate via Phase 2’s diff-aware merging, clean through forgetting passes and stale pruning, delete via /m_drop or clear-memories — represents the most sophisticated agent memory system in any terminal-based coding tool today. The nine configuration tunables give enterprise teams the knobs they need to balance recall, latency, and data governance. As the pipeline continues to evolve, expect tighter integration with the TUI and finer-grained per-project memory partitioning.

Citations

  1. Blake Crosley — Codex CLI: The Definitive Technical Reference — Memory system overview, version history, slash commands, and data governance enhancements.  2 3 4 5

  2. Codex CLI Features — OpenAI Developers — Official feature documentation including conversation resumption and context persistence. 

  3. DeepWiki — Memory System (openai/codex) — Two-phase pipeline architecture, SQLite storage, context injection, and source file references.  2 3 4 5 6 7 8

  4. Mintlify — Codex Configuration Reference — Full [memories] configuration section with all tunables.  2 3

  5. Zylos Research — OpenAI Codex CLI Architecture and Multi-Runtime Agent Patterns — Phase 2 consolidation model, diff labels, and Claude Code comparison.  2 3 4

  6. Justin3go — Shedding Heavy Memories: Context Compaction in Codex, Claude Code, and OpenCode — Diff-based forgetting, polluted thread handling, and compaction strategies.  2

  7. Mintlify — Codex Memory & Project Docs — AGENTS.md hierarchy, disabling memory, and project docs configuration.  2

  8. OpenAI Developers — Codex CLI Changelog — Release notes for v0.119.0 and v0.120.0.