Kimi K2.6 in Cursor and Cline in 2026: free-tier setup via OpenRouter, the temperature fix, and when to drop GPT-5.5

kimicursorclineopenroutersetup-guidepricingvswindsurf

TL;DR: Kimi K2.6 has a free tier on OpenRouter — 21.3B tokens per week at $0 — and ties GPT-5.5 on SWE-Bench Pro at 58.6% while costing 7× less on the paid tier. Cursor takes a ten-minute configure. Cline has a temperature bug with the direct Moonshot API; routing through OpenRouter avoids it entirely. Windsurf (now Devin Desktop) doesn’t support the model natively.

K2.6 free (OpenRouter)K2.6 paid (OpenRouter)GPT-5.5
Best forEvaluation, personal projectsHigh-volume agent sessionsTasks above 262K tokens or needing vision
Input / 1M tokens$0$0.684$5.00
Output / 1M tokens$0$3.42$30.00
Context window262K262K1M
SWE-Bench Pro58.6%58.6%58.6%
The catchShared pool; queues at peak hoursRoute through OpenRouter, not direct Moonshot7× pricier input, 8.8× pricier output

Honest take: If you’re paying for GPT-5.5 as a Cline or Cursor Chat backend and your work stays under 262K tokens, K2.6 via OpenRouter matches the quality at a fraction of the cost. Start with the free tier this week — if you hit the ceiling, the paid tier at $0.684/M is still 7× cheaper than what you’re spending now.


The free-tier math you should run before anything else

OpenRouter maintains a capacity pool for Kimi K2.6 at model ID moonshotai/kimi-k2.6:free. The limit is 21.3B tokens per week — shared across all free users, so availability fluctuates at peak hours, but in off-peak windows the throughput is fast enough for production use.

What 21.3B tokens per week actually means for one developer: a typical Cline agentic session processing 10 files with 8 tool calls burns roughly 50,000 input tokens and 8,000 output tokens (about 58K total). At that rate, the weekly pool would theoretically cover hundreds of thousands of sessions — you’re sharing it with other developers, but a developer burning 5–10 sessions per day (around 290K–580K tokens) sits comfortably within any realistic individual allocation.

The paid tier (moonshotai/kimi-k2.6) costs $0.684/M input and $3.42/M output on OpenRouter. That same 10-file session costs approximately $0.034. The same session on GPT-5.5 ($5.00/$30.00 per million) costs approximately $0.49 — 14× more for the same benchmark score.

Monthly comparison for a developer running 200 sessions per month (roughly 10 per working day):

K2.6 freeK2.6 paid (OpenRouter)GPT-5.5
Monthly spend$0$6.80$98
Sessions covered200 (if pool available)UnlimitedUnlimited

If the free pool is unavailable during a crunch session, switching to the paid tier is a toggle in your provider settings — no re-authentication required.


What the SWE-Bench number actually means right now

Kimi K2.6 launched April 20, 2026 as the first open-weight model to beat GPT-5.4 on SWE-Bench Pro, scoring 58.6%. That was the top open-weight result at release.

Six weeks later, the leaderboard looks different. Claude Mythos Preview leads at 77.8%, Claude Opus 4.8 sits at 69.2%, and Claude Opus 4.7 at 64.3%. K2.6 is ranked 7th at 58.6%, tied with GPT-5.5. For a full breakdown of the model architecture and the launch-day benchmark context, see the Kimi K2.6 review.

The tie with GPT-5.5 is the number that matters for this article. Both resolve GitHub-style software engineering issues at the same measured rate. The difference is entirely cost and context length: GPT-5.5 has a 1M-token window versus K2.6’s 262K, but charges $5.00/M input versus $0.684/M. For tasks that fit in 262K tokens — which covers most single-repository coding work — there is no quality reason to pay the GPT-5.5 premium.

The one benchmark where GPT-5.5 stretches ahead: SWE-bench Verified (the simpler variant of the benchmark), where Claude Opus 4.7 scores 87.6% versus K2.6’s 80.2%. For complex, multi-file orchestrations that require near-perfect instruction following across dozens of tool calls, Claude Opus 4.7 or 4.8 is still the better choice — but neither Claude variant is what we’re comparing here. Against GPT-5.5 specifically, K2.6 holds even.


Setting up Cursor with K2.6

Cursor’s Chat panel, Cmd+K, and Agent mode all route through the OpenAI API format and accept any compatible endpoint via the base URL override. OpenRouter exposes one.

What you need first: An OpenRouter account and API key. Sign up at openrouter.ai, navigate to API Keys, and generate a key. Free accounts work for the free tier model.

In Cursor (version 0.50+, tested June 2026):

  1. Open SettingsModels
  2. In the OpenAI API Key field, paste your OpenRouter API key
  3. In the Override OpenAI Base URL field, enter exactly:
    https://openrouter.ai/api/v1
  4. Scroll up and click + Add Custom Model
  5. Enter the model ID. Free tier:
    moonshotai/kimi-k2.6:free
    Paid tier (no shared queue):
    moonshotai/kimi-k2.6
  6. Press Enter, then click Verify

Expected output after clicking Verify:

Model verification successful
moonshotai/kimi-k2.6:free — available

If Verify hangs for more than 10 seconds on the free tier, the shared pool is at capacity. Retry in 30 seconds or switch to the paid model ID — the verification itself costs a trivial number of tokens.

Once verified, the model appears in the Chat panel dropdown under Custom. Switch to it for long refactor sessions where you want to load large context. Switch back to Claude Sonnet or GPT-4o for shorter, precision-critical tasks where established models have more tuning.

What the override doesn’t touch: Cursor’s Tab autocomplete runs on Cursor’s proprietary infrastructure and is completely unaffected by the base URL override. The custom model setting covers Chat, Cmd+K, and Agent mode only. If Tab completions are your primary value from Cursor, this configuration doesn’t reduce your Cursor Pro spend — it only substitutes the API calls that would otherwise hit OpenAI or Anthropic directly.


Setting up Cline with K2.6 — and fixing the temperature error

The direct route to Kimi K2.6 in Cline is the Moonshot API endpoint: api.moonshot.ai/v1 with model ID kimi-k2.6. It looks like it should work — Kimi’s API is OpenAI-compatible, Cline supports OpenAI-compatible providers. In practice, it fails:

POST https://api.moonshot.ai/v1/chat/completions
Status: 400 Bad Request
{"error": "invalid temperature: only 1 is allowed for this model"}

Kimi K2.6’s Moonshot endpoint requires temperature: 1. Cline’s internal default for code tasks sends 0 or a lower float, and there’s no per-provider temperature override in Cline’s current settings (tracked in GitHub issue #10544). The request fails before any code generation happens.

The fix is routing through OpenRouter instead of hitting Moonshot directly. OpenRouter remaps temperature values to what each upstream provider accepts — K2.6 on OpenRouter receives temperature 1 regardless of what the client sends.

Cline setup via OpenRouter:

  1. Open VS Code with the Cline extension installed (v3.x, June 2026)
  2. Click the Cline icon in the left sidebar → click the provider dropdown at the top of the panel
  3. Select OpenAI Compatible
  4. Set Base URL: https://openrouter.ai/api/v1
  5. Set API Key: your OpenRouter API key
  6. Set Model ID: moonshotai/kimi-k2.6 (or :free for the free tier)
  7. Click Save

Test immediately with a simple task to confirm the connection:

List the top-level files in the current workspace

The first response includes a list_files tool call followed by results. If you see a 401, the API key is wrong. If you see model_not_found, check the model ID — it’s kimi-k2.6 with a period, not kimi-k2-6 with hyphens.

One practical note on the 262K context: Cline users have reported that K2.6 maintains consistent behavior across long sessions better than shorter-context models that need to truncate. A refactor session touching 80 files and 120K+ tokens of context stays coherent without needing Cline’s context-summarization mode to kick in. That’s a genuine workflow advantage over 128K-window alternatives.


Windsurf (Devin Desktop): what works and what doesn’t

Windsurf rebranded as Devin Desktop in early June 2026 (covered in the rebrand breakdown). The BYOK system is unchanged through the rebrand: Devin Desktop’s native BYOK only accepts Anthropic API keys. You can bring your own Claude key; you cannot point the native integration at OpenRouter or Moonshot directly.

Roo Code was the standard workaround for custom model endpoints in VS Code-based editors. It shut down in May 2026 — the team relaunched as Roomote with a different product direction.

Three paths for Windsurf/Devin Desktop users who need K2.6:

Install Cline inside Windsurf. Windsurf is a VS Code fork and supports VS Code extensions. Install the Cline extension directly in Windsurf, configure it to use OpenRouter as shown above. Cline runs as a panel within the editor — you get Cline’s agentic capabilities with K2.6 while Windsurf’s Cascade continues on its own model. Two AI systems in the same editor is unusual but it works; several developers in the Windsurf Discord run exactly this configuration.

Use the ACP protocol path (future). Devin Desktop ships with Agent Client Protocol support, which lets external ACP-compatible agents run inside the editor. As of June 7, 2026, no ACP wrapper for K2.6 exists in the public registry, but this becomes the cleaner path if one ships.

Switch to VS Code + Cline for K2.6 work. VS Code with Cline configured to OpenRouter gives the same agentic experience as Cascade for most tasks. If your Devin Desktop use case is primarily autonomous coding rather than the rebrand-specific Agent Command Center features, this is the lowest-friction way to access K2.6 without fighting the BYOK limitation.


Where K2.6 beats GPT-5.5, where it doesn’t

The benchmark tie holds across a specific range of tasks: code generation, single-file and multi-file refactors, repository-scoped issue resolution, and test generation — all within a 200K-token context. Outside that range:

K2.6 has the edge:

  • Any session where you’d otherwise throttle token usage to control GPT-5.5 costs — the 7× input cost difference means you can run more complete context without budget pressure
  • Projects where loading the full codebase into a single context window matters — 262K covers most mid-sized repositories without chunking
  • The free tier makes evaluation genuinely free: run your actual most complex Cline session on moonshotai/kimi-k2.6:free before spending anything

GPT-5.5 holds advantages:

  • Very long context sessions above 262K tokens (GPT-5.5’s 1M window is the differentiator for genuinely large monorepos)
  • Multimodal tasks where image inputs are central — K2.6 has a vision encoder but GPT-5.5’s vision performance is more consistently benchmarked in coding tool integrations
  • Workflows where existing prompts and system messages are tuned to OpenAI response formatting (tool-call response structure differences can cause subtle mismatches in some Cursor Agent templates)

The practical test: take your current most expensive Cline session from the past week — the one with the most files and tool calls — and replay it with K2.6 on the free tier. If the output quality is indistinguishable, you’ve found your answer on cost.


FAQ

Does K2.6 work with GitHub Copilot BYOK? Yes. GitHub Copilot’s BYOK accepts OpenAI-compatible endpoints including Moonshot’s. Use api.moonshot.ai/v1 as the base URL with your Moonshot API key, or route through OpenRouter. The temperature issue described above may affect GitHub Copilot’s Chat as well — OpenRouter is the safer path.

Can I run K2.6 locally? Technically yes — the weights are on Hugging Face. Practically no: the model weighs 594 GB at INT4 quantization, requiring 8× H100 GPUs minimum. Not a workstation option. Cloudflare Workers AI serves K2.6 for serverless inference if you need a managed path without managing your own cluster.

Does OpenRouter log my code on the free tier? OpenRouter’s privacy policy states that requests are not used for model training. For codebases with strict IP or compliance requirements, the paid Moonshot API with explicit data processing terms is cleaner than any shared free tier.

Will the free tier disappear? OpenRouter’s free model tiers depend on capacity agreements with upstream providers. Moonshot has offered free tiers since K2.5 launched. There’s no announced end date, but it’s not guaranteed — treat it as a trial path rather than a permanent cost structure.

What’s the right model ID, paid or free, for Cursor? Start with moonshotai/kimi-k2.6:free. Switch to moonshotai/kimi-k2.6 if the free pool queues consistently during your working hours. Both use the same model weights; the only difference is routing priority and cost.


Sources

Last updated June 7, 2026. Pricing and model rankings change frequently; verify current state before purchasing.

Was this article helpful?