Microsoft MAI-Code-1-Flash in GitHub Copilot 2026: 137B/5B MoE, Sub-Second Latency, and Whether It Beats Claude Haiku 4.5 on Your Bill
TL;DR: Microsoft’s first in-house coding model, MAI-Code-1-Flash, hit general availability for Copilot Business and Enterprise on June 26, 2026. It is a 137B-total / 5B-active sparse MoE built for speed, and Microsoft’s own numbers put it ahead of Claude Haiku 4.5 on every coding benchmark they ran — while using up to 60% fewer tokens. For the cheap-and-fast lane of your Copilot stack, it is now the default to beat. It is not a replacement for a frontier model on hard multi-file refactors.
| MAI-Code-1-Flash | Claude Haiku 4.5 | A frontier model (Sonnet 4.6 / GPT-5.5) | |
|---|---|---|---|
| Best for | Fast inline edits, autocomplete, small functions, cheap agent loops | Same lane, slightly weaker on agentic tasks | Deep multi-file refactors, architecture, planning |
| Copilot token price (in / out per 1M) | $0.75 / $4.50 | $1.00 / $5.00 | Several times higher |
| SWE-Bench Pro | 51.2 | 35.2 | Higher, at much higher cost |
| The catch | 5B active params — shallow on complex reasoning | Pricier and slower for the same tier | Burns AI Credits fast under usage billing |
Honest take: If your daily driver is the cheap-tier model in Copilot’s picker, switch it to MAI-Code-1-Flash today — it is faster, scores higher, and bills lower than Claude Haiku 4.5. Keep a frontier model one click away for the hard problems; this is a workhorse, not a brain.
Microsoft has been licensing other people’s models inside GitHub Copilot for years. MAI-Code-1-Flash is the first coding model it trained end-to-end itself, and the timing is not subtle: it lands three weeks into Copilot’s usage-based billing era, where every token you spend shows up on a bill. A model that is both cheaper per token and uses fewer tokens is a direct answer to the billing backlash that followed the June 1 AI Credits switch.
What MAI-Code-1-Flash actually is
It is a sparse Mixture-of-Experts transformer: 137 billion total parameters, but only 5 billion active per token. That split is the whole design. The 5B active count is why it responds fast and bills cheap — the model only lights up a small slice of its weights for any given request. The 137B total is why it still has enough knowledge to score respectably on real benchmarks instead of collapsing the way a dense 5B model would.
Context window is 256K tokens, which is generous for a model in this class and enough to hold a meaningful slice of a repo in an agent loop. Microsoft says it was trained from March to May 2026 on “clean and appropriately licensed data,” and — the more interesting claim — trained directly against the GitHub Copilot harness developers use in production, not against synthetic benchmark suites. Whether that holds up in your codebase is the open question, but it is at least a different bet than “train on everything and hope.”
The other architectural note worth knowing is adaptive thinking. The model allocates almost no reasoning budget to a simple autocomplete and expands to multi-step reasoning only when the task looks like a refactor or an architecture question. That is how it avoids the latency tax of always-on chain-of-thought while keeping quality up on the harder requests. In practice it means tab completion stays instant and the agent does not stall on trivial edits.
The benchmark numbers (and the asterisk)
Here are Microsoft’s published results, MAI-Code-1-Flash vs Claude Haiku 4.5:
| Benchmark | MAI-Code-1-Flash | Claude Haiku 4.5 | Delta |
|---|---|---|---|
| SWE-Bench Verified | 71.6 | 66.6 | +5.0 |
| SWE-Bench Pro | 51.2 | 35.2 | +16.0 |
| Terminal Bench 2 | 54.8 | 41.6 | +13.2 |
| Token use (SWE-Bench Verified) | up to 60% fewer | baseline | — |
The asterisk: these are Microsoft’s own numbers, comparing its new model against one specific competitor it chose. That is not the same as an independent leaderboard, and the comparison is pointedly against Claude Haiku 4.5 — Anthropic’s small, cheap model — not against Sonnet 4.6 or any frontier tier. The honest reading is “MAI-Code-1-Flash is now the strongest model in the budget lane,” not “MAI-Code-1-Flash beats Claude.” On Terminal-Bench 2’s public leaderboard, frontier models like GPT-5.5 sit far above 54.8. Keep the comparison in its weight class.
That said, +16 points on SWE-Bench Pro over the model it is replacing is a large gap for a same-tier swap, and the token-efficiency claim is the part that actually shows up on your invoice.
The billing angle: why “60% fewer tokens” matters now
Before June 1, 2026, the model you picked in Copilot barely affected your bill — you paid a flat Premium Request multiplier. After the usage-based billing switch, you burn AI Credits against each model’s per-token list price, and the model choice is now a line item.
Verified Copilot list prices, per 1M tokens:
| Model | Input | Cached input | Output |
|---|---|---|---|
| MAI-Code-1-Flash | $0.75 | $0.075 | $4.50 |
| Claude Haiku 4.5 | $1.00 | — | $5.00 |
MAI-Code-1-Flash is 25% cheaper on input and 10% cheaper on output before you count token efficiency. Stack the “up to 60% fewer tokens” claim on top and the gap compounds.
A worked example. Take a typical agentic edit session — roughly 50K input tokens (the model reads your files) and 8K output tokens (it writes the diff):
- Claude Haiku 4.5: 50K × $1.00/M + 8K × $5.00/M = $0.050 + $0.040 = $0.090/session
- MAI-Code-1-Flash, same token count: 50K × $0.75/M + 8K × $4.50/M = $0.0375 + $0.036 = $0.074/session
- MAI-Code-1-Flash, if it hits the 60%-fewer-tokens claim: ~20K × $0.75/M + ~3.2K × $4.50/M ≈ $0.029/session
At a few hundred sessions a month, the difference between $0.09 and $0.03 per session is the difference between staying inside your credit allotment and getting an overage bill. The 60% figure is benchmark-derived, not a guarantee for your repo — treat the $0.074 number as the floor you can count on and anything below it as upside.
On the legacy request-based billing that annual Pro and Pro+ plans can still be on, MAI-Code-1-Flash carries a 0.33 model multiplier — a promotional rate at the time of writing. Multipliers are a legacy-billing concept and do not apply under the new usage-based system; if you are on usage billing, the per-token prices above are what count.
Availability: who can actually pick it today
This rolled out in stages, and the staging matters because “GA” did not mean “everyone, everywhere” at any single point:
- May 29, 2026 — initial rollout to Copilot Free, Pro, Pro+, and Max, VS Code first.
- June 18, 2026 — expanded to more surfaces: Copilot CLI, the Copilot cloud agent, the GitHub Copilot app, Copilot Chat on GitHub, Visual Studio, GitHub Mobile, JetBrains IDEs, Eclipse, and Xcode.
- June 26, 2026 — general availability for Copilot Business and Copilot Enterprise.
The Business/Enterprise GA has a gotcha that tripped up a lot of teams: an admin has to flip the policy on first.
The problem you’ll actually hit: “it’s not in my model picker”
The single most common complaint in the GitHub community thread is developers on Business or Enterprise plans who read the GA announcement, open VS Code, and find no MAI-Code-1-Flash in the dropdown. Three things cause it, in order of likelihood:
- Admin policy not enabled. For Copilot Business and Enterprise, an administrator must enable the MAI-Code-1-Flash policy in Copilot settings before any seat can select it. This is the big one — GA for the plan does not auto-enable the model for the org.
- Gradual rollout. Microsoft confirmed it is still rolling the model out gradually even within enabled plans. If the policy is on and you still don’t see it, you may simply be in a later wave.
- Stale client. Older VS Code or extension builds won’t surface it. Update both.
The fix sequence:
# 1. Confirm your Copilot extension and VS Code are current
code --version
# In VS Code: Extensions panel → GitHub Copilot / Copilot Chat → Update
# 2. Org admins (Business/Enterprise):
# github.com → Org Settings → Copilot → Policies
# → enable "MAI-Code-1-Flash" → Save
# 3. Reload the window so the picker re-fetches the model list
# Command Palette (Cmd/Ctrl+Shift+P) → "Developer: Reload Window"
Then select it in the model picker dropdown in Copilot Chat or the inline edit menu. If it is there, you will see it sit near the top of the list as a fast/low-cost option.
Where it fits — and where it doesn’t
A 5B-active model is excellent at exactly what its name implies: flash work. Inline completions, single-function generation, small targeted edits, fast agentic loops where you want many cheap iterations rather than one expensive deep think. The adaptive-thinking design means it does not embarrass itself on a medium-complexity refactor, but it is not the model you reach for when you need it to reason across twelve files and hold an architecture in its head. That is still frontier-model territory, and you will pay frontier-model credits for it.
The smart move under usage billing is a two-model setup: MAI-Code-1-Flash as your default for the 80% of requests that are routine, and a frontier model — Claude Sonnet 4.6 or GPT-5.5 — held in reserve for the hard 20%. That is exactly the kind of cost discipline that the Copilot Max tier analysis and the Copilot agent-mode deep dive both point at: the expensive part of agentic coding is reaching for a big model on small problems.
If you want to take the cheap lane all the way to zero, that is a different article — running a local coding model on your own hardware costs nothing per token but needs a GPU. We covered the hardware tiers for that on our sister site, runaihome.com. But for developers who are committed to the Copilot ecosystem and just want their cheap-tier model to be the best and cheapest available, MAI-Code-1-Flash is the answer as of late June 2026.
FAQ
Is MAI-Code-1-Flash free? It is available on the free Copilot tier, but “available” and “free to run” are different things under usage-based billing. Its tokens count against your plan’s AI Credit allotment like any other model. It is the cheapest credible coding model in the picker, not a $0 one.
How many parameters does it have? 137 billion total, 5 billion active per token — a sparse Mixture-of-Experts design. The 5B active count is what makes it fast and cheap; the 137B total is what keeps its benchmark scores up.
Does it really beat Claude Haiku 4.5? On Microsoft’s own benchmarks, yes: SWE-Bench Verified 71.6 vs 66.6, SWE-Bench Pro 51.2 vs 35.2, Terminal Bench 2 54.8 vs 41.6. These are first-party numbers against one chosen competitor in the same budget tier — not an independent leaderboard, and not a comparison against frontier models.
Why can’t I see it in my model picker? On Business or Enterprise plans, an org admin must enable the MAI-Code-1-Flash policy first. Otherwise, check that your VS Code and Copilot extension are current, and note that Microsoft is still rolling it out gradually even on enabled plans.
Should I use it for big refactors? No. It is built for fast, small, cheap work. For multi-file refactors and architecture-level reasoning, switch to a frontier model and accept the higher credit cost for that task only.
What’s the context window? 256K tokens — enough to hold a substantial chunk of a repository in an agent loop.
Sources
- Introducing MAI-Code-1-Flash — Microsoft AI
- MAI-Code-1-Flash for Copilot Business and Copilot Enterprise — GitHub Changelog (June 26, 2026)
- MAI-Code-1-Flash available on more Copilot surfaces — GitHub Changelog (June 18, 2026)
- MAI-Code-1-Flash is now available for GitHub Copilot — community Discussion #197306
- Models and pricing for GitHub Copilot — GitHub Docs
- Microsoft AI on X — benchmark numbers vs Claude Haiku 4.5
Last updated June 27, 2026. Pricing and features change frequently; verify current state before purchasing.
Was this article helpful?
Thanks for the feedback — it helps improve future articles.