Jun 13, 2026

Kimi K2.7 Code Review 2026: 1T Open-Weight Coding Model as a Cursor and Cline Backend

By AICoderScope Team · 12 min read

kimilocal-llmclinecursorreviewai

TL;DR: Kimi K2.7 Code (released June 12) is the cheapest credible agentic coding model on the market — $0.95/$4 per million tokens, open weights, Modified MIT license. It burns ~30% fewer thinking tokens than K2.6 and beats Opus 4.8 on tool-use benchmarks. But “open weight” does not mean “run it on your 4090”: this is a 1-trillion-parameter model, so for almost everyone the real product is the API.

	Kimi K2.7 Code	Claude Fable 5	Kimi K2.6
Best for	Cheap agentic coding via API, tool-heavy workflows	The hardest long-horizon refactors	Same niche, slightly cheaper
Price (in / out per M)	$0.95 / $4.00	$10 / $50	$0.60 / ~$2.50
SWE-bench Pro	Not published at launch	80.3%	58.6%
The catch	Always-on thinking mode; not self-hostable on consumer GPUs	10× the token cost	Older; uses ~30% more reasoning tokens

Honest take: If you already route Cline or Claude Code to a cheap API backend, swap in K2.7 Code today — it is the best price-to-capability ratio you can get and the tool-calling is genuinely strong. If you came here hoping to run it locally for privacy, stop now and read the Cline + LM Studio guide instead — a 32B dense model on your own box is the realistic local play.

Moonshot AI dropped Kimi K2.7 Code on June 12, 2026, pushing the full weights to Hugging Face the same day. It is their fifth major model in under a year, and the naming makes the positioning obvious: this is the coding-specialized member of the K2 family, tuned for long-horizon, agentic software work rather than chat.

I have spent the day pointing Cline and Claude Code at it through the Moonshot API. What follows is what the spec sheet says, what the benchmarks actually support, the part nobody wants to say out loud about self-hosting a trillion-parameter model, and the exact config that gets it running as your coding backend.

What actually shipped

The headline numbers are real and worth getting straight:

1 trillion total parameters, Mixture-of-Experts, with 32B active per token across 384 experts. You get frontier-scale capacity but pay roughly 32B-model compute per forward pass.
256K-token context window — enough to hold a mid-size repo plus a long agent trajectory without aggressive truncation.
Modified MIT license — commercial use, fine-tuning, and redistribution allowed. This is the genuinely open part.
Native INT4 quantization baked in via quantization-aware training (QAT), not a lossy afterthought. Moonshot reports roughly 2× faster inference and ~50% lower memory versus the bf16 weights with negligible quality loss.
Always-on thinking mode. K2.7 Code never runs in a non-reasoning mode; it preserves full reasoning content across multi-turn conversations. The upside is better tool decisions; the catch is you cannot turn the reasoning tokens off to save money.

The model is multimodal (text, image, video) with a MoonViT vision encoder, but for coding that is mostly a footnote — you will feed it diffs and stack traces, not screenshots.

The benchmark picture — read this carefully

Here is where the marketing and the verifiable record diverge, so I will separate them.

What Moonshot published are gains relative to K2.6, not absolute frontier comparisons: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on the multi-language MLS Bench Lite, alongside the ~30% cut in reasoning-token usage on equivalent tasks. Those are first-party numbers on first-party benchmarks. Treat the percentages as “K2.7 is meaningfully better than K2.6 at Moonshot’s own evals,” not as a head-to-head win over Fable 5.

What independent signal exists so far: K2.7 Code posts 81.1% on MCPMark Verified, a benchmark that measures correct tool invocation through the Model Context Protocol — ahead of Claude Opus 4.8’s 76.4%. That is the one cross-vendor number I would actually lean on, and it matches my hands-on experience: K2.7’s tool-calling discipline is its strongest feature. It rarely hallucinates a function signature and recovers cleanly when a tool returns an error.

What Moonshot did not publish at launch is a SWE-bench Verified or SWE-bench Pro score. That is a conspicuous gap. For reference, the predecessor K2.6 scored 58.6% on SWE-bench Pro back in April, which led the open-weight pack at the time. Until independent SWE-bench Pro numbers land for K2.7, anyone telling you it “beats Claude on coding” is extrapolating.

Model	SWE-bench Pro	Tool use (MCPMark)	Price in / out per M	Open weights
Kimi K2.7 Code	Not published	81.1%	$0.95 / $4.00	Yes (Modified MIT)
Kimi K2.6	58.6%	~76%	$0.60 / ~$2.50	Yes
Claude Fable 5	80.3%	n/a	$10 / $50	No
Claude Opus 4.8	69.2%	76.4%	$5 / $25	No

The story the table tells: K2.7 Code is not chasing Fable 5 on the hardest end-to-end SWE-bench tasks. It is competing on dollars per solved task and on agentic reliability, and on those axes it is excellent. At roughly one-tenth of Fable 5’s output price, it does not need to win on raw capability to be the rational default for high-volume agent runs.

”But can I run it locally?” — the honest answer is mostly no

This is the question that brings most people to an open-weight model, so let me be blunt. A 1-trillion-parameter MoE is not a single-GPU model, and INT4 does not change that conclusion.

Even quantized, the weights occupy roughly 500GB of VRAM. The realistic verified configurations for serving K2.7-class models are things like 8× H200, or aggregate VRAM around 640GB — i.e. a multi-GPU server, not a workstation. The “24GB is enough for INT4” claims floating around launch-day blog posts are wrong; they are confusing this model with a small dense model. A trillion parameters at 4 bits is half a terabyte no matter how you slice it.

If your goal is genuine local, air-gapped inference for privacy, K2.7 Code is the wrong tool. The right tool is a dense 14B–32B coding model on your own hardware. For that path, our Cline + LM Studio setup and the hardware tiers in runaihome’s best local AI models by VRAM breakdown are where you should be. If you genuinely want to self-host K2.7 at scale, you are renting cloud GPUs — and at that point the math almost never beats Moonshot’s own API.

The supported serving stacks, for the record, are vLLM, SGLang, and Docker Model Runner. They work. The question is whether you can afford 8 datacenter GPUs to use them.

The practical path: API and OpenRouter

For 95% of readers, K2.7 Code is an API. Pricing:

Moonshot API: $0.95 / M input, $4.00 / M output, $0.19 / M on cache hits. Model id: kimi-k2.7-code.
OpenRouter: same $0.95 / $4.00, model id moonshotai/kimi-k2.7-code.

The cache-hit price is the underrated number. Agentic coding replays a large, stable system prompt and repo context on every turn; at $0.19/M, a cached prefix makes long sessions dramatically cheaper than the headline rate suggests.

One honest note on the trend: K2.6 was $0.60/M input, so K2.7 is a ~58% price bump at the input tier. Moonshot is charging more for the better model. It is still cheap — Fable 5 input is $10/M — but it is not the rock-bottom price K2.6 set.

Wiring K2.7 Code into Cline

The Moonshot API is both OpenAI- and Anthropic-compatible, which is why a base-URL swap is all you need. For Cline, use the OpenAI-compatible endpoint:

Provider:  OpenAI Compatible
Base URL:  https://api.moonshot.ai/v1
API Key:   <your Moonshot key from platform.moonshot.ai>
Model ID:  kimi-k2.7-code

Drop those into Cline’s API settings, reload the extension, and the model picker should show kimi-k2.7-code as available. A working first turn looks like this: ask it to “add a /health route that returns {status: 'ok'} and a test for it,” and you should see Cline stream a short reasoning block, then a write_to_file tool call for the route, then a second one for the test — no manual approval loops if you have auto-approve on for file writes.

Wiring it into Claude Code

Because the API speaks Anthropic’s protocol, Claude Code works with environment variables — no plugin needed:

export ANTHROPIC_BASE_URL="https://api.moonshot.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="<your Moonshot key>"
export ANTHROPIC_MODEL="kimi-k2.7-code"
export ANTHROPIC_DEFAULT_OPUS_MODEL="kimi-k2.7-code"
export ANTHROPIC_DEFAULT_SONNET_MODEL="kimi-k2.7-code"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="kimi-k2.7-code"
export CLAUDE_CODE_SUBAGENT_MODEL="kimi-k2.7-code"
export ENABLE_TOOL_SEARCH=false
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=262144

On Windows PowerShell the pattern is $env:ANTHROPIC_BASE_URL="https://api.moonshot.ai/anthropic" and so on. Launch claude after exporting and it routes every tier — main model, subagents, the lot — to K2.7 Code.

The gotcha I hit: if you skip CLAUDE_CODE_AUTO_COMPACT_WINDOW=262144, Claude Code keeps the default Anthropic compaction threshold and starts auto-compacting your context far earlier than K2.7’s real 256K window requires — you lose repo context you actually paid to load. Setting it to the model’s true token ceiling (262144) is what makes the 256K window usable. The second one to set is ENABLE_TOOL_SEARCH=false; leaving it on caused intermittent tool-routing weirdness in my testing because the deferred-tool flow assumes Anthropic-side behavior the Moonshot endpoint doesn’t replicate.

Using it through Cursor

Cursor is fussier about third-party models than Cline or Claude Code. You add the OpenAI-compatible endpoint under Settings → Models → OpenAI API Key → Override Base URL, set the base URL to https://api.moonshot.ai/v1, your Moonshot key, and add kimi-k2.7-code as a custom model. Two caveats from experience: this routes Chat and Cmd-K but not Cursor’s proprietary Tab autocomplete, which stays on Cursor’s own model regardless. And Cursor’s agent features lean on Anthropic-specific tool formats, so K2.7 behaves more reliably in Cline or Claude Code than inside Cursor’s Agent mode. If your whole reason for switching is the agent, use Cline.

Where it breaks

The always-on thinking mode is a double-edged sword. It is why tool use is so clean, but it also means every trivial request — “rename this variable” — still spins up a reasoning pass you pay output tokens for. There is no “fast, no-think” tier. For tiny edits, a cheaper non-reasoning model or local autocomplete is more economical.

The missing SWE-bench Pro number is the other thing I would not gloss over. K2.6 had a real, independently-cited 58.6%. K2.7 launched on relative gains and one tool-use benchmark. The model feels strong in practice, but if your buying decision hinges on end-to-end task completion versus Fable 5, wait a week for independent SWE-bench results rather than trusting launch-day percentages.

And to repeat the point that matters most: this is not a privacy play. Routing through Moonshot’s API sends your code to Moonshot’s servers in the same way Anthropic or OpenAI would. Open weights give you the option of self-hosting, but the option costs a GPU server.

Verdict: who should actually pay for it

Pay for K2.7 Code via API if you run agentic coding at volume and your bill matters — indie hackers, solo devs burning through Cline or Claude Code sessions, anyone who looked at a Fable 5 token bill and winced. The tool-calling reliability plus the $0.19/M cache rate make it the strongest cheap backend available right now, comfortably ahead of where K2.6 and DeepSeek V4-Flash sit.

Stay on Claude Fable 5 if you are doing the hard 20% — multi-file refactors across a large codebase where one wrong edit costs an hour. The capability gap on the toughest SWE-bench tasks is real and worth $50/M output for those jobs.

Skip it entirely if you wanted local privacy. Half a terabyte of VRAM is not a workflow; a 32B dense model on your own box is.

FAQ

Is Kimi K2.7 Code free? The weights are free to download under a Modified MIT license, but using it through the Moonshot API costs $0.95/M input and $4/M output. There is no free Moonshot API tier for K2.7 Code at launch (K2.6 has a free OpenRouter tier; K2.7 Code does not).

Can I run Kimi K2.7 Code on a single RTX 4090? No. Even at INT4 the weights are roughly 500GB. Realistic serving needs multi-GPU server hardware (on the order of 8× H200). For single-GPU local coding, use a dense 14B–32B model instead.

What’s the model id? kimi-k2.7-code on the Moonshot API, moonshotai/kimi-k2.7-code on OpenRouter.

Does it work with Claude Code? Yes — the Moonshot API is Anthropic-compatible. Set ANTHROPIC_BASE_URL=https://api.moonshot.ai/anthropic, your auth token, and point the model env vars at kimi-k2.7-code. Also set CLAUDE_CODE_AUTO_COMPACT_WINDOW=262144 to use the full 256K context.

How much better is it than K2.6? Moonshot reports +21.8% on Kimi Code Bench v2, +11% on Program Bench, +31.5% on MLS Bench Lite, and ~30% fewer thinking tokens. Independent SWE-bench Pro numbers had not been published at launch.

Is it better than Claude Fable 5 for coding? On price, overwhelmingly — it is about one-tenth the cost. On the hardest end-to-end tasks, Fable 5 (80.3% SWE-bench Pro) is still ahead; K2.7’s comparable score wasn’t public at launch.

Sources

Last updated June 13, 2026. Pricing and benchmark numbers change frequently and several K2.7 figures are Moonshot-reported pending independent verification; confirm current state before purchasing.

Was this article helpful?