Jun 22, 2026

MiMo Code Review 2026: Xiaomi's Open-Source Claude Code Challenger and Whether the 200-Step Benchmark Claims Hold Up

By AICoderScope Team · 10 min read

mimoclaude-codeopencodereviewaiterminal

TL;DR: MiMo Code is Xiaomi’s MIT-licensed fork of OpenCode that adds a persistent cross-session memory system and bundles free, limited-time access to a 1.02-trillion-parameter model. Xiaomi’s own numbers say it beats Claude Code on three benchmarks, but those scores are self-reported, compare against Claude Code on Sonnet 4.6 (not Opus 4.8), and appear on no independent leaderboard. The memory architecture is the real reason to try it.

	MiMo Code	Claude Code	OpenCode
Best for	Ultra-long agentic tasks (200+ steps), free frontier-model trial	Anthropic-stack teams wanting the strongest model + polish	BYOK / local-model workflows, full provider freedom
Price / Cost	$0 (MIT); free MiMo-V2.5-Pro via “MiMo Auto” for now	$20/mo Pro, $100/mo Max, or API billing	$0 core (BYOK); $10/mo Go tier
The catch	Bundled model is cloud-only and time-limited; benchmarks unaudited	Model lock to Anthropic	Configuration overhead; default routes to a third-party model

Honest take: Try MiMo Code for the memory system and the free 1T-model window, not because it “beats Claude Code” — that claim leans on a benchmark setup Xiaomi designed and graded itself. For day-to-day work where you control the model, Claude Code and OpenCode are still the safer picks.

What MiMo Code actually is

Strip away the launch-day headlines and MiMo Code is a fork of OpenCode, the MIT-licensed terminal coding agent. Xiaomi’s MiMo team kept the parts that made OpenCode good — multiple model providers, the TUI, language-server (LSP) integration, MCP support, and the plugin system — and bolted on the features they think win long tasks.

The additions, straight from the repository README:

Persistent memory: project memory, checkpoints, notes, and task tracking that survive across sessions
Intelligent context reconstruction with explicit token budgeting
Subagent orchestration with parallel execution
Goal and stop-condition evaluation via an independent judge model
Compose mode for specs-driven development (the same idea AWS Kiro built its whole product around — see our Kiro review)
Voice input (needs a MiMo login or compatible provider)
“Dream & distill” — a background pass that extracts reusable knowledge and turns repeated workflows into shortcuts

V0.1.0 was open-sourced on June 10, 2026; v0.1.1 followed on June 15. The source code is MIT-licensed, though the bundled model and hosted service carry separate Use Restrictions and Xiaomi’s MiMo Terms of Service — read those before you wire it into anything commercial.

The benchmark claims — and the asterisks that matter

Here’s the comparison Xiaomi published. Both harnesses were run with the same MiMo-V2.5-Pro model where noted, and Claude Code was run on Claude Sonnet 4.6.

Benchmark	MiMo Code	Claude Code (Sonnet 4.6)	Gap
SWE-bench Verified	82%	79%	+3 pts
SWE-bench Pro	62%	55%	+7 pts
Terminal-Bench 2	73%	69%	+4 pts

Xiaomi’s headline framing: run the same MiMo-V2.5-Pro inside both harnesses and MiMo Code still comes out roughly five points ahead on each benchmark — their argument that “the agent architecture matters as much as the model.” They also ran a double-blind A/B with 576 developers: under 200 execution steps the two tools split about 50/50, but past 200 steps MiMo Code reportedly won more than 65% of head-to-head matchups.

Now the parts the press release skips:

These are self-reported. MiMo Code does not appear on the official Terminal-Bench 2.0 leaderboard or any independent SWE-bench board. Xiaomi designed the test harness, picked the configuration, and graded the runs.

The Claude Code comparison is on Sonnet 4.6, not Opus 4.8. That matters a lot. On the verified Terminal-Bench 2.1 leaderboard, Claude Code paired with Opus 4.8 scores 78.9% — nine points above the 69% Xiaomi attributes to “Claude Code.” In other words, Xiaomi benchmarked against Claude Code’s mid-tier model and reported it as Claude Code’s ceiling.

The independent #1 is something else entirely. On Terminal-Bench 2.0, Codex CLI with GPT-5.5 holds the top spot at 82.0% (83.4% on the 2.1 update) — above MiMo Code’s self-reported 73%. If you want the agent that wins audited benchmarks today, the honest answer is still the Codex CLI / Claude Code tier, not MiMo Code.

None of this makes the numbers fake. Vendor benchmarks are a starting point, not a verdict. But “beats Claude Code” is doing a lot of work in the headlines, and the fine print deflates most of it.

Installing it

MiMo Code is terminal-native on Linux, macOS, and WSL. Two install paths:

# macOS / Linux — official installer
curl -fsSL https://mimo.xiaomi.com/install | bash

# Windows (via WSL) or any npm environment
npm install -g @mimo-ai/cli

Confirm it landed:

$ mimo --version
mimo 0.1.1

$ mimo
# launches the TUI; on first run it prompts you to either log in
# to MiMo or configure your own provider (OpenAI-compatible, Anthropic, etc.)

The zero-configuration path is MiMo Auto — an anonymous channel that routes to MiMo-V2.5-Pro free of charge “for a limited time.” You can start coding without an API key, which is the single biggest reason to try it this month: it’s a free trial of a 1T-parameter frontier-class model with no signup friction. When that window closes, you fall back to bring-your-own-key, exactly like upstream OpenCode.

The model under the hood — and why you can’t run it locally

The bundled model, MiMo-V2.5-Pro, is a sparse mixture-of-experts design:

Spec	Value
Total parameters	1.02 trillion
Active parameters	42 billion
Context window	1M tokens
Attention	Hybrid SWA + global, 6:1 ratio, 128-token window
Pre-training	27T tokens, FP8 mixed precision

The 42B active parameters mean inference compute is closer to a 42B dense model — but you still need enough memory to hold the full 1.02T-parameter weights resident. That puts genuine local inference firmly out of reach for any single-workstation setup. There is no 8GB, 24GB, or even 96GB VRAM tier that runs this thing; you’re looking at a multi-GPU server. For the overwhelming majority of readers, MiMo-V2.5-Pro is a cloud model you access through MiMo Auto, full stop.

If your actual goal is local, private, AI coding — the most common request we get — MiMo Code isn’t the tool. Keep MiMo Code’s harness if you like it, but point it at a small local model instead, the same way you would with OpenCode. Our OpenCode + Ollama setup guide covers that workflow, and runaihome.com’s local-model-by-VRAM guide tells you which quantized coder fits your card. For FOSS-first tooling more broadly, aifoss.dev tracks the open-weight coding stack.

Where it actually shines: long tasks that would lose context

The persistent-memory pitch isn’t marketing fluff — it targets a real failure mode. Run any agent through a 150-step refactor and you watch it forget decisions it made an hour earlier: it re-reads files it already understood, re-derives the project’s conventions, and occasionally undoes its own work because the relevant context fell out of the window.

I hit exactly this on a long migration with a stock OpenCode setup last month: around step 90 the agent “rediscovered” a config file it had already rewritten and proposed reverting it. MiMo Code’s checkpoint-and-notes system is built to stop that — it persists task state and project memory outside the live context window, then reconstructs only what’s relevant on each turn. Whether it’s worth switching tools for depends entirely on how long your tasks run. If your typical session is a 20-minute bug fix, you will never notice the difference and the self-reported 50/50 split under 200 steps confirms it. If you routinely hand an agent multi-hour, multi-file jobs and walk away, the memory layer is the one feature here you can’t get cleanly from Claude Code today.

A real gotcha to plan for: on WSL, clipboard integration needs a workaround (the README flags it), so copy-paste of generated snippets out of the TUI isn’t seamless on Windows. The fix is to run inside Windows Terminal with WSL clipboard sharing enabled, or pipe output to a file (mimo run "..." > out.diff) and open it in your editor rather than copying from the TUI buffer.

Where it breaks

The free model is temporary. Build a workflow around MiMo Auto and you’re building on a trapdoor. Xiaomi hasn’t published a price for what comes after the free window. Plan to bring your own key.
v0.1.x is early. A fork that’s two point-releases old inherits OpenCode’s fast, occasionally-bug-prone cadence and adds new surface area (memory, subagents, judge model) that hasn’t been hardened in the wild.
Benchmarks aren’t independently verified. Until MiMo Code shows up on tbench.ai or a neutral SWE-bench run, treat the “beats Claude Code” line as a hypothesis.
Cloud-only flagship model. If privacy or air-gapped work is the requirement, the headline model is a non-starter and you’re back to local models in a generic harness.

Verdict: who should actually install it

Install MiMo Code this month if you want a no-cost trial of a frontier-class 1T model and you run genuinely long agentic tasks where context loss is your real pain. The memory architecture is the differentiator, and the free MiMo Auto window makes the cost of trying it zero.

Don’t switch your daily driver to it yet. For most developers, Claude Code on Opus 4.8 or the Codex CLI on GPT-5.5 deliver better-audited results with more polish, and OpenCode already gives you the same fork-able base with full model freedom. MiMo Code is a serious, well-engineered entry — just not the Claude Code killer the launch headlines sold.

FAQ

Is MiMo Code free? The harness is MIT-licensed and free. The bundled MiMo-V2.5-Pro model is free through the “MiMo Auto” channel for a limited time; Xiaomi has not announced post-trial pricing. Bring-your-own-key works like upstream OpenCode.

Does MiMo Code really beat Claude Code? On Xiaomi’s own self-reported benchmarks, yes — by 3 to 7 points on SWE-bench Verified, SWE-bench Pro, and Terminal-Bench 2. But those tests pit MiMo Code against Claude Code running Sonnet 4.6, not Opus 4.8, and don’t appear on any independent leaderboard. On the audited Terminal-Bench 2.0 board, Codex CLI with GPT-5.5 leads at 82.0%, ahead of MiMo Code’s claimed 73%.

Can I run MiMo-V2.5-Pro locally? Not realistically. It’s a 1.02T-parameter MoE; even at 42B active params you need enough memory for the full weights, which means a multi-GPU server. For local coding, point the harness at a smaller model — see our OpenCode + Ollama guide.

What platforms does MiMo Code support? Linux, macOS, and WSL. Windows users go through WSL and should expect a clipboard workaround in the TUI.

Is it based on OpenCode? Yes. It’s a fork that keeps OpenCode’s providers, TUI, LSP, MCP, and plugins, and adds persistent memory, subagent orchestration, a judge model, Compose mode, and “dream & distill.”

Sources

Last updated June 22, 2026. Pricing and features change frequently; verify current state before purchasing.

Was this article helpful?