Kimi K2.7 Code Review 2026: 1T Open-Weight Coding Model as a Cursor and Cline Backend
TL;DR: Kimi K2.7 Code (released June 12) is the cheapest credible agentic coding model on the market — $0.95/$4 per million tokens, open weights, Modified MIT license. It burns ~30% fewer thinking tokens than K2.6 and beats Opus 4.8 on tool-use benchmarks. But “open weight” does not mean “run it on your 4090”: this is a 1-trillion-parameter model, so for almost everyone the real product is the API.
| Kimi K2.7 Code | Claude Fable 5 | Kimi K2.6 | |
|---|---|---|---|
| Best for | Cheap agentic coding via API, tool-heavy workflows | The hardest long-horizon refactors | Same niche, slightly cheaper |
| Price (in / out per M) | $0.95 / $4.00 | $10 / $50 | $0.60 / ~$2.50 |
| SWE-bench Pro | Not published at launch | 80.3% | 58.6% |
| The catch | Always-on thinking mode; not self-hostable on consumer GPUs | 10× the token cost | Older; uses ~30% more reasoning tokens |
Honest take: If you already route Cline or Claude Code to a cheap API backend, swap in K2.7 Code today — it is the best price-to-capability ratio you can get and the tool-calling is genuinely strong. If you came here hoping to run it locally for privacy, stop now and read the Cline + LM Studio guide instead — a 32B dense model on your own box is the realistic local play.
Moonshot AI dropped Kimi K2.7 Code on June 12, 2026, pushing the full weights to Hugging Face the same day. It is their fifth major model in under a year, and the naming makes the positioning obvious: this is the coding-specialized member of the K2 family, tuned for long-horizon, agentic software work rather than chat.
I have spent the day pointing Cline and Claude Code at it through the Moonshot API. What follows is what the spec sheet says, what the benchmarks actually support, the part nobody wants to say out loud about self-hosting a trillion-parameter model, and the exact config that gets it running as your coding backend.
What actually shipped
The headline numbers are real and worth getting straight:
- 1 trillion total parameters, Mixture-of-Experts, with 32B active per token across 384 experts. You get frontier-scale capacity but pay roughly 32B-model compute per forward pass.
- 256K-token context window — enough to hold a mid-size repo plus a long agent trajectory without aggressive truncation.
- Modified MIT license — commercial use, fine-tuning, and redistribution allowed. This is the genuinely open part.
- Native INT4 quantization baked in via quantization-aware training (QAT), not a lossy afterthought. Moonshot reports roughly 2× faster inference and ~50% lower memory versus the bf16 weights with negligible quality loss.
- Always-on thinking mode. K2.7 Code never runs in a non-reasoning mode; it preserves full reasoning content across multi-turn conversations. The upside is better tool decisions; the catch is you cannot turn the reasoning tokens off to save money.
The model is multimodal (text, image, video) with a MoonViT vision encoder, but for coding that is mostly a footnote — you will feed it diffs and stack traces, not screenshots.
The benchmark picture — read this carefully
Here is where the marketing and the verifiable record diverge, so I will separate them.
What Moonshot published are gains relative to K2.6, not absolute frontier comparisons: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on the multi-language MLS Bench Lite, alongside the ~30% cut in reasoning-token usage on equivalent tasks. Those are first-party numbers on first-party benchmarks. Treat the percentages as “K2.7 is meaningfully better than K2.6 at Moonshot’s own evals,” not as a head-to-head win over Fable 5.
What independent signal exists so far: K2.7 Code posts 81.1% on MCPMark Verified, a benchmark that measures correct tool invocation through the Model Context Protocol — ahead of Claude Opus 4.8’s 76.4%. That is the one cross-vendor number I would actually lean on, and it matches my hands-on experience: K2.7’s tool-calling discipline is its strongest feature. It rarely hallucinates a function signature and recovers cleanly when a tool returns an error.
What Moonshot did not publish at launch is a SWE-bench Verified or SWE-bench Pro score. That is a conspicuous gap. For reference, the predecessor K2.6 scored 58.6% on SWE-bench Pro back in April, which led the open-weight pack at the time. Until independent SWE-bench Pro numbers land for K2.7, anyone telling you it “beats Claude on coding” is extrapolating.
| Model | SWE-bench Pro | Tool use (MCPMark) | Price in / out per M | Open weights |
|---|---|---|---|---|
| Kimi K2.7 Code | Not published | 81.1% | $0.95 / $4.00 | Yes (Modified MIT) |
| Kimi K2.6 | 58.6% | ~76% | $0.60 / ~$2.50 | Yes |
| Claude Fable 5 | 80.3% | n/a | $10 / $50 | No |
| Claude Opus 4.8 | 69.2% | 76.4% | $5 / $25 | No |
The story the table tells: K2.7 Code is not chasing Fable 5 on the hardest end-to-end SWE-bench tasks. It is competing on dollars per solved task and on agentic reliability, and on those axes it is excellent. At roughly one-tenth of Fable 5’s output price, it does not need to win on raw capability to be the rational default for high-volume agent runs.
”But can I run it locally?” — the honest answer is mostly no
This is the question that brings most people to an open-weight model, so let me be blunt. A 1-trillion-parameter MoE is not a single-GPU model, and INT4 does not change that conclusion.
Even quantized, the weights occupy roughly 500GB of VRAM. The realistic verified configurations for serving K2.7-class models are things like 8× H200, or aggregate VRAM around 640GB — i.e. a multi-GPU server, not a workstation. The “24GB is enough for INT4” claims floating around launch-day blog posts are wrong; they are confusing this model with a small dense model. A trillion parameters at 4 bits is half a terabyte no matter how you slice it.
If your goal is genuine local, air-gapped inference for privacy, K2.7 Code is the wrong tool. The right tool is a dense 14B–32B coding model on your own hardware. For that path, our Cline + LM Studio setup and the hardware tiers in runaihome’s best local AI models by VRAM breakdown are where you should be. If you genuinely want to self-host K2.7 at scale, you are renting cloud GPUs — and at that point the math almost never beats Moonshot’s own API.
The supported serving stacks, for the record, are vLLM, SGLang, and Docker Model Runner. They work. The question is whether you can afford 8 datacenter GPUs to use them.
The practical path: API and OpenRouter
For 95% of readers, K2.7 Code is an API. Pricing:
- Moonshot API: $0.95 / M input, $4.00 / M output, $0.19 / M on cache hits. Model id:
kimi-k2.7-code. - OpenRouter: same
$0.95 / $4.00, model idmoonshotai/kimi-k2.7-code.
The cache-hit price is the underrated number. Agentic coding replays a large, stable system prompt and repo context on every turn; at $0.19/M, a cached prefix makes long sessions dramatically cheaper than the headline rate suggests.
One honest note on the trend: K2.6 was $0.60/M input, so K2.7 is a ~58% price bump at the input tier. Moonshot is charging more for the better model. It is still cheap — Fable 5 input is $10/M — but it is not the rock-bottom price K2.6 set.
Wiring K2.7 Code into Cline
The Moonshot API is both OpenAI- and Anthropic-compatible, which is why a base-URL swap is all you need. For Cline, use the OpenAI-compatible endpoint:
Provider: OpenAI Compatible
Base URL: https://api.moonshot.ai/v1
API Key: <your Moonshot key from platform.moonshot.ai>
Model ID: kimi-k2.7-code
Drop those into Cline’s API settings, reload the extension, and the model picker should show kimi-k2.7-code as available. A working first turn looks like this: ask it to “add a /health route that returns {status: 'ok'} and a test for it,” and you should see Cline stream a short reasoning block, then a write_to_file tool call for the route, then a second one for the test — no manual approval loops if you have auto-approve on for file writes.
Wiring it into Claude Code
Because the API speaks Anthropic’s protocol, Claude Code works with environment variables — no plugin needed:
export ANTHROPIC_BASE_URL="https://api.moonshot.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="<your Moonshot key>"
export ANTHROPIC_MODEL="kimi-k2.7-code"
export ANTHROPIC_DEFAULT_OPUS_MODEL="kimi-k2.7-code"
export ANTHROPIC_DEFAULT_SONNET_MODEL="kimi-k2.7-code"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="kimi-k2.7-code"
export CLAUDE_CODE_SUBAGENT_MODEL="kimi-k2.7-code"
export ENABLE_TOOL_SEARCH=false
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=262144
On Windows PowerShell the pattern is $env:ANTHROPIC_BASE_URL="https://api.moonshot.ai/anthropic" and so on. Launch claude after exporting and it routes every tier — main model, subagents, the lot — to K2.7 Code.
The gotcha I hit: if you skip CLAUDE_CODE_AUTO_COMPACT_WINDOW=262144, Claude Code keeps the default Anthropic compaction threshold and starts auto-compacting your context far earlier than K2.7’s real 256K window requires — you lose repo context you actually paid to load. Setting it to the model’s true token ceiling (262144) is what makes the 256K window usable. The second one to set is ENABLE_TOOL_SEARCH=false; leaving it on caused intermittent tool-routing weirdness in my testing because the deferred-tool flow assumes Anthropic-side behavior the Moonshot endpoint doesn’t replicate.
Using it through Cursor
Cursor is fussier about third-party models than Cline or Claude Code. You add the OpenAI-compatible endpoint under Settings → Models → OpenAI API Key → Override Base URL, set the base URL to https://api.moonshot.ai/v1, your Moonshot key, and add kimi-k2.7-code as a custom model. Two caveats from experience: this routes Chat and Cmd-K but not Cursor’s proprietary Tab autocomplete, which stays on Cursor’s own model regardless. And Cursor’s agent features lean on Anthropic-specific tool formats, so K2.7 behaves more reliably in Cline or Claude Code than inside Cursor’s Agent mode. If your whole reason for switching is the agent, use Cline.
Where it breaks
The always-on thinking mode is a double-edged sword. It is why tool use is so clean, but it also means every trivial request — “rename this variable” — still spins up a reasoning pass you pay output tokens for. There is no “fast, no-think” tier. For tiny edits, a cheaper non-reasoning model or local autocomplete is more economical.
The missing SWE-bench Pro number is the other thing I would not gloss over. K2.6 had a real, independently-cited 58.6%. K2.7 launched on relative gains and one tool-use benchmark. The model feels strong in practice, but if your buying decision hinges on end-to-end task completion versus Fable 5, wait a week for independent SWE-bench results rather than trusting launch-day percentages.
And to repeat the point that matters most: this is not a privacy play. Routing through Moonshot’s API sends your code to Moonshot’s servers in the same way Anthropic or OpenAI would. Open weights give you the option of self-hosting, but the option costs a GPU server.
Verdict: who should actually pay for it
Pay for K2.7 Code via API if you run agentic coding at volume and your bill matters — indie hackers, solo devs burning through Cline or Claude Code sessions, anyone who looked at a Fable 5 token bill and winced. The tool-calling reliability plus the $0.19/M cache rate make it the strongest cheap backend available right now, comfortably ahead of where K2.6 and DeepSeek V4-Flash sit.
Stay on Claude Fable 5 if you are doing the hard 20% — multi-file refactors across a large codebase where one wrong edit costs an hour. The capability gap on the toughest SWE-bench tasks is real and worth $50/M output for those jobs.
Skip it entirely if you wanted local privacy. Half a terabyte of VRAM is not a workflow; a 32B dense model on your own box is.
FAQ
Is Kimi K2.7 Code free? The weights are free to download under a Modified MIT license, but using it through the Moonshot API costs $0.95/M input and $4/M output. There is no free Moonshot API tier for K2.7 Code at launch (K2.6 has a free OpenRouter tier; K2.7 Code does not).
Can I run Kimi K2.7 Code on a single RTX 4090? No. Even at INT4 the weights are roughly 500GB. Realistic serving needs multi-GPU server hardware (on the order of 8× H200). For single-GPU local coding, use a dense 14B–32B model instead.
What’s the model id?
kimi-k2.7-code on the Moonshot API, moonshotai/kimi-k2.7-code on OpenRouter.
Does it work with Claude Code?
Yes — the Moonshot API is Anthropic-compatible. Set ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic, your auth token, and point the model env vars at kimi-k2.7-code. Also set CLAUDE_CODE_AUTO_COMPACT_WINDOW=262144 to use the full 256K context.
How much better is it than K2.6? Moonshot reports +21.8% on Kimi Code Bench v2, +11% on Program Bench, +31.5% on MLS Bench Lite, and ~30% fewer thinking tokens. Independent SWE-bench Pro numbers had not been published at launch.
Is it better than Claude Fable 5 for coding? On price, overwhelmingly — it is about one-tenth the cost. On the hardest end-to-end tasks, Fable 5 (80.3% SWE-bench Pro) is still ahead; K2.7’s comparable score wasn’t public at launch.
Sources
- Kimi K2.7 Code — OpenRouter model page (pricing, model id, context)
- Kimi AI releases open-source K2.7 Code with 1 trillion parameters — CryptoBriefing
- Moonshot AI Launches Kimi-K2.7-Code with 1 Trillion Parameters — KuCoin
- Use Kimi K2.7 Code in Claude Code / Cline / RooCode — Kimi API Platform docs
- Best AI Model for Coding (June 2026): ranked by SWE-bench Pro and cost — MorphLLM
- Deploy Kimi K2.6 on GPU Cloud (VRAM and serving requirements) — Spheron
- Kimi K2.6 Tech Blog: Advancing Open-Source Coding — Moonshot AI
Last updated June 13, 2026. Pricing and benchmark numbers change frequently and several K2.7 figures are Moonshot-reported pending independent verification; confirm current state before purchasing.
Was this article helpful?
Thanks for the feedback — it helps improve future articles.