May 24, 2026

OpenAI Codex CLI Review 2026: Free With ChatGPT Plus, GPT-5.5 Under the Hood — But Can It Beat Claude Code?

By AICoderScope Team · 12 min read

reviewopenaiterminal-agentcomparisonpricingworkflow

Most developers paying $20 a month for ChatGPT Plus don’t realize they’re also paying for a terminal coding agent with 75,600 GitHub stars. Codex CLI ships inside the Plus subscription at no extra charge — same $20 you’re already spending on ChatGPT.

Four million developers run it weekly. Thirty-two million npm installs in the past twelve months. GPT-5.5, OpenAI’s current flagship (released April 23, 2026), powers it with a 400,000-token context window and 88.7% on SWE-bench Verified.

The real question: is it good enough to replace Aider or Claude Code, or is this one of those “technically free but practically underpowered” situations?

Short answer: for well-defined, self-contained bulk tasks — “refactor this module,” “write tests for these five functions,” “implement this spec” — Codex CLI is fast, autonomous, and free if you’re already on ChatGPT Plus. For interactive pair programming and complex architectural decisions, Claude Code still has the edge on task focus and context discipline.

What Codex CLI actually is

Codex CLI (@openai/codex on npm) is an open-source terminal agent that reads your local codebase and executes changes directly. Install with:

npm install -g @openai/codex

Node 22 required. Common install mistake: running npm i -g codex instead of npm i -g @openai/codex — different package, silent failure.

The CLI reads AGENTS.md for project instructions — the same open standard used by 60,000+ open-source projects and supported by Aider, OpenHands, and GitHub Copilot. If you’re also using Claude Code, you’ll need a separate CLAUDE.md; the two formats are incompatible, and dual-tool setups require two config files.

Four surfaces ship under the Codex brand:

CLI — local terminal agent (this review’s focus)
IDE extensions — VS Code and JetBrains (added January 2026)
Codex macOS app — parallel agents command center, launched February 2, 2026
ChatGPT web — accessible at chatgpt.com/codex

The CLI and IDE extensions run as “Codex Local” — your filesystem, your machine, your environment. The macOS app enables “Codex Cloud” — sandboxed cloud container tasks that keep running when your laptop is closed.

Pricing: what your ChatGPT plan actually gets you

OpenAI switched Codex to token-based billing on April 2, 2026 (extended to Enterprise plans April 23). Before that, billing was message-based. The change matters if you’re running long autonomous sessions.

Plan	Monthly cost	Codex allocation	Context window
ChatGPT Plus	$20/mo	Included (rolling 5-hour window caps)	400K tokens
ChatGPT Pro	$200/mo	20× Plus (ongoing)	400K tokens
ChatGPT Business	$30/user/mo	Credits-based	400K tokens
BYOK (own API key)	API list price	Unlimited	Varies

The rolling 5-hour window on Plus is the catch. Heavy users who run multiple full-auto sessions per day — the use case Codex handles best — will hit it. The window resets continuously rather than daily, but it’s a real ceiling for production-grade autonomous coding workloads.

Token billing rates for GPT-5.3-Codex (the documented rate card): 43.75 credits per 1M input tokens, 4.375 credits per 1M cached input tokens, 350 credits per 1M output tokens. GPT-5.5 uses fewer tokens per task than GPT-5.4 for most workloads, per OpenAI’s launch notes, so effective cost is lower than those rates imply.

Pro at $200/month gets 20× the Plus allocation on an ongoing basis. A promotional 25× window ran through May 31, 2026; the standard rate post-promo is 20×.

For cost-sensitive teams: BYOK via ~/.codex/config.toml routes to OpenAI-compatible endpoints. codex-mini-latest costs $1.50 input / $6 output per million tokens — substantially cheaper than full GPT-5.5 API pricing when lighter tasks don’t need frontier reasoning. You can also point BYOK at other OpenAI-compatible providers. To compare GPT, Claude, Gemini, and DeepSeek API costs side-by-side, use the AI API Cost Calculator.

Overage credits are purchasable when plan limits are hit, without requiring a plan upgrade.

Three approval modes — where Codex is different

This is what separates Codex CLI from Aider and Claude Code. Three discrete autonomy levels:

Suggest mode (default): Every file edit and shell command requires your explicit approval before execution. Use this when working in production codebases where you want to review everything.

Auto-edit mode: File changes apply automatically; Codex stops to confirm before running shell commands. The right setting for feature development on a clean branch where you trust the edits but want to gate execution.

Full-auto mode (--full-auto): No confirmations. Codex executes within the sandbox boundary without pausing. This flag combines --approval-mode never and --sandbox workspace-write as a single shortcut (added in v0.1.2). Use it for well-scoped isolated tasks where friction is the enemy.

Sandbox modes control blast radius separately from approval policy:

workspace-write (default with full-auto): reads files, edits within the working directory, runs routine local commands. Edits outside the workspace or network access still require approval.
danger-full-access: removes all filesystem and network restrictions. Warranted only when a task explicitly needs to reach external services.

Claude Code’s default behavior is more conversational — it asks clarifying questions before acting, which catches more edge cases in complex work but slows down straightforward tasks. Codex’s three-mode system lets you set the autonomy level per task type, which is the right design for agentic workflows where some tasks should run unattended and others shouldn’t.

GPT-5.5 and the benchmarks

GPT-5.5 launched April 23, 2026, and is the current default model in Codex. It brings a 400,000-token context window — double the usable context in Cursor Pro — and a Fast mode at 1.5× standard generation speed (at 2.5× the credit cost).

Benchmark scores:

Model	SWE-bench Verified	Terminal-Bench 2.0
GPT-5.5 (via Codex)	88.7%	82.7% (#1)
Claude Opus 4.7	87.6%	Not ranked
GPT-5.3-Codex	~85%	~77.3%

Required caveat: SWE-bench Verified and Terminal-Bench 2.0 use different harnesses, different problem sets, and different agentic scaffolds. GPT-5.5’s 88.7% is OpenAI-reported at launch. Claude Opus 4.7’s 87.6% is Anthropic-reported. Neither number has been independently replicated on identical infrastructure. Treat the 1.1-point gap as noise, not signal.

Terminal-Bench 2.0 is worth paying attention to because it’s explicitly designed for CLI agents — multi-step command-line workflows requiring planning, tool coordination, and iteration across turns. It’s the more relevant benchmark for evaluating Codex CLI specifically, and GPT-5.5’s 82.7% is currently the highest score on that leaderboard.

Practically: both GPT-5.5 via Codex and Claude Code via Opus 4.7 are operating at the frontier. The benchmark gap between them is smaller than the gap in task-type fit. Codex handles bulk autonomous execution better; Claude Code handles interactive refinement and ambiguous requirements better.

What actually breaks

No model selection control: Within ChatGPT subscriptions, OpenAI routes to model variants internally based on task complexity and repository size. You cannot specify “use GPT-5.5 Thinking for this one.” For granular model control, BYOK direct API access is the only path.

Stale context in cloud tasks: Each Codex Cloud task (via the macOS app) starts fresh with a snapshot of your repo at the time of dispatch. If your main branch is changing fast — multiple engineers merging simultaneously — the resulting PR may conflict with work that landed while Codex was running. Design async workflows around feature branches with clear, stable merge points.

Sandbox ≠ CI environment: The workspace-write sandbox doesn’t replicate Docker dependencies, specific environment variables, or integration test infrastructure. Tests that pass locally in Codex’s sandbox can fail in CI. Codex is not a substitute for running your actual test suite.

Windows is experimental: macOS and Linux get full native support. Windows ships with an AppContainer-based sandbox that restricts filesystem writes and blocks network by default, but the official designation is experimental. WSL2 is the documented recommendation for Windows users who need Linux-native behavior.

OpenAI model lock-in via subscription path: The ChatGPT Plus/Pro path gives you OpenAI models only. No Claude, no Gemini, no Mistral via the standard subscription surface. BYOK unlocks other providers, but then you’re paying API token rates rather than subscription flat rates.

Codex app is macOS-only: The parallel agents command center launched February 2, 2026, on macOS. Linux and Windows users are waiting.

MCP and AGENTS.md

Codex CLI supports MCP natively — both stdio (local child process) and Streamable HTTP (remote server with optional OAuth). Configure servers in ~/.codex/config.toml or manage with codex mcp CLI commands.

If you’re running MCP servers for Claude Code, you can point Codex at the same infrastructure. The two agents can share MCP servers but need separate config files (CLAUDE.md for Claude Code, AGENTS.md for Codex). To generate a ready-to-paste MCP config for Claude Code or Cursor, use the MCP Server Config Generator.

Where Claude Code still leads: CLAUDE.md’s layered settings system supports project-level, user-level, and local-scope configurations, along with hooks for auto-formatting, blocking destructive Bash commands, and custom automation. If your team has invested in CLAUDE.md workflow automation, Codex CLI’s AGENTS.md doesn’t currently replicate that depth. AGENTS.md is simpler by design — widely adopted across the open-source ecosystem precisely because it’s easy to pick up.

The Codex macOS app

The macOS app is Codex’s “command center” for parallel cloud agents. Fire off multiple concurrent tasks — each running in its own cloud-sandboxed container — and monitor progress in parallel. Cloud environment tasks run independently of your local machine.

This architecture is different from Claude Code’s subagent model. Claude Code agents run within your local context and bill against your subscription; Codex Cloud tasks run in OpenAI’s infrastructure with persistent container sessions. For bulk overnight workloads — a batch of issue fixes, parallel feature implementations across repos — the Codex Cloud pattern is genuinely useful.

The constraint: each task starts without knowledge of previous sessions. There’s no learning from past PR review feedback built in. If continuity across tasks matters for your workflow, Claude Code’s session context handles that better.

Estimated cloud task cost for reference: a 2-hour session with a 16 GB container runs approximately $4.59 at current API rates, independent of your subscription allocation.

Codex CLI vs Claude Code vs Aider

	Codex CLI	Claude Code	Aider
Subscription cost	Free with ChatGPT Plus ($20/mo)	Pro $20/mo standalone	Free (BYOK only)
Model (flagship)	GPT-5.5	Claude Opus 4.7	Any BYOK
SWE-bench Verified	88.7% (GPT-5.5)	87.6% (Opus 4.7)	Model-dependent
Context window	400K tokens	200K tokens	Model-dependent
Config standard	AGENTS.md	CLAUDE.md	.aider.conf
Approval modes	3 (suggest/auto-edit/full-auto)	Interactive default	—auto-commits flag
MCP support	Yes (stdio + HTTP)	Yes	Limited
Windows support	Experimental	Full	Full
Cloud agents	Yes (macOS app)	No	No
Model choice	OpenAI stack (or BYOK)	Anthropic stack	Any BYOK
Open source	Yes (MIT)	No	Yes (Apache 2.0)

For a direct Claude Code vs Cursor workflow comparison, see Cursor vs Claude Code 2026. For xAI’s parallel-agent entry to the CLI coding market with 8 simultaneous worktree agents, see the Grok Build CLI review 2026.

Honest take

Already on ChatGPT Plus? Install Codex CLI today. You’re paying for it and not using it. Run npm install -g @openai/codex, start in suggest mode on a branch, and move to full-auto once you’ve calibrated to the output quality.

Choosing between Codex CLI and Claude Code Pro at the same $20/month? Codex CLI wins on raw context window (400K vs 200K), benchmark parity, and the bulk-autonomous-task use case. Claude Code wins on interactive task discipline, CLAUDE.md workflow automation depth, and handling ambiguous requirements without going off the rails.

The answer for most teams: run both. Claude Code for architectural planning and complex refactors where you want the back-and-forth; Codex CLI in full-auto for well-scoped implementation tasks that don’t need supervision. Both can commit to the same branch. The combined cost is $40/month — the same as Cursor Teams per seat.

The one case where Codex CLI is clearly the wrong choice: teams that have invested in CLAUDE.md hooks, layered settings, and custom slash commands. That ecosystem doesn’t translate to AGENTS.md without rebuilding it.

If you’re already on ChatGPT Plus, the barrier to trying Codex CLI is one npm command.

1V1 POWER USER KIT · CLAUDE CODE

Stop treating Claude Code like a chatbot in a terminal.

5 CLAUDE.md templates, 4 slash commands, 4 subagents, 3 hooks. The structured setup that cuts a $200 Max bill to $30.

Get it for $19 (early bird) →

Sources

Last updated May 24, 2026. Pricing and features change frequently; verify current state before purchasing.

Was this article helpful?