Jul 3, 2026

Grok Code Fast 1 as your Cursor and Cline backend in 2026: $0.20/$1.50 per million tokens, 256K context, and the BYOK setup that works

By AICoderScope Team · 11 min read

grokcursorclinesetup-guideapicostvs

TL;DR: Grok Code Fast 1 is xAI’s cheap agentic-coding model, and at $0.20 input / $1.50 output per million tokens it is one of the least expensive backends you can point Cursor or Cline at. It wires in through the OpenAI-compatible endpoint at https://api.x.ai/v1 in about ten minutes. The catch: it carries a 256K context window (not the 1M you get from Sonnet 5 or Grok 4.x) and its 70.8% SWE-bench Verified score is from xAI’s own harness, so treat it as a fast, cheap workhorse — not your hardest-problem model.

	Grok Code Fast 1	Claude Sonnet 5	GPT-5.6 Terra
Best for	High-volume Cline/Cursor agent loops on a tight budget	Long-context refactors, max instruction fidelity	Mid-tier reasoning at moderate cost
Input / Output per 1M tokens	$0.20 / $1.50	$2.00 / $10.00 (intro) → $3.00 / $15.00	$2.50 / $15.00
Cached input per 1M	$0.02	$0.20 (intro) / $0.30	not published for preview
Context window	256K tokens	1M tokens	not published for preview
Coding benchmark	70.8% SWE-bench Verified (xAI harness)	63.2% SWE-bench Pro / 57% CursorBench	Sol Ultra 91.9% Terminal-Bench 2.1 (preview)
Availability	GA, self-serve API	GA everywhere	Limited preview (~20 orgs)

Honest take: Set Grok Code Fast 1 as your default Cline model for the grind — test writing, boilerplate, mechanical refactors, agent loops that chew through tokens. Keep Claude Sonnet 5 on speed-dial for the multi-file refactor that has to land right the first time. At 10× cheaper input, Grok pays for the second opinion.

First, clearing up what this model actually is

If you found this because a July 2026 roundup called Grok Code Fast 1 a hot new release with a free trial across every major editor, that framing is stale. The model shipped on August 28, 2025 (it ran quietly the week before under the codename “sonic”), and the free launch period with Cursor, Cline, GitHub Copilot, Roo Code, Kilo Code, opencode, and Windsurf ended months ago. What’s actually true in mid-2026 is more useful: the model never went away, xAI has kept it on small checkpoint updates rather than a version bump, and it has quietly settled in as one of the cheapest agentic coding backends you can buy. That is the story worth your time — not a trial that expired.

So this is not a launch-hype piece. It’s a setup guide plus the cost math for wiring a genuinely cheap model into the two editors most developers already run.

What you’ll be able to do after this guide

Point Cursor’s Chat and Agent modes at grok-code-fast-1 through your own xAI key
Run Cline entirely on Grok Code Fast 1 with the correct OpenAI-compatible config
Know the exact dollar cost of a real coding session, and when the 256K context window will bite you

Getting an xAI key (5 minutes, no waitlist)

Head to console.x.ai and sign up with an email or your X account. There’s no approval queue and no Premium subscription requirement — sign-in is self-serve. Once you’re in, open API Keys in the left sidebar (or go straight to console.x.ai/team/default/api-keys) and create one. xAI keys start with xai- followed by a long alphanumeric string. New accounts get $25 in promotional credits, which at Grok Code Fast 1 rates is a lot of runway — enough to run agent loops for days before you spend a real dollar.

Confirm the key works before you touch any editor config. This is the single most common failure point, so verify it in isolation:

curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -d '{
    "model": "grok-code-fast-1",
    "messages": [{"role": "user", "content": "Reply with the single word: ready"}],
    "stream": false
  }'

Expected output (trimmed):

{
  "id": "chatcmpl-...",
  "model": "grok-code-fast-1",
  "choices": [
    { "message": { "role": "assistant", "content": "ready" }, "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 14, "completion_tokens": 2, "total_tokens": 16 }
}

If that returns ready, the endpoint, key, and model name are all correct and the rest is just plumbing.

Wiring it into Cursor

Cursor speaks the OpenAI API format, so Grok Code Fast 1 drops in as a custom model. Open Settings → Models. Scroll to the OpenAI API key section and:

Toggle on Override OpenAI Base URL and set it to https://api.x.ai/v1
Paste your xai-... key into the API key field
Under model names, add a custom model called exactly grok-code-fast-1
Click Verify — Cursor pings the endpoint and confirms the key

Now select grok-code-fast-1 from the model picker in Chat or Agent mode. It handles the multi-step agent loop well, which is what the model was tuned for.

One thing that trips people up every time: overriding the base URL routes Chat and Agent through xAI, but Cursor’s Tab autocomplete still runs on Cursor’s own in-house models. That’s by design — Tab is a proprietary Cursor feature, not an API call you can redirect. So a Cursor Pro subscription still earns its keep for autocomplete even when every chat token is going to xAI. This is the same limitation that applies to DeepSeek, Codestral, and every other BYOK backend, and it’s covered in more depth in our DeepSeek V4-Flash backend guide.

Wiring it into Cline

Cline is the cleaner integration because it was a launch partner and treats any OpenAI-compatible endpoint as a first-class option. In the Cline settings panel:

Set API Provider to OpenAI Compatible
Base URL: https://api.x.ai/v1
API Key: your xai-... key
Model ID: grok-code-fast-1

Save, and Cline runs entirely on Grok. If you’d rather not manage the key inside the extension, the same thing works through environment variables that Cline reads:

export OPENAI_BASE_URL="https://api.x.ai/v1"
export OPENAI_API_KEY="xai-your-key-here"
# then set Model ID to grok-code-fast-1 in the Cline model field

Prefer OpenRouter for consolidated billing across models? Grok Code Fast 1 is exposed there as x-ai/grok-code-fast-1 on https://openrouter.ai/api/v1 — set that base URL, use your OpenRouter key, and use the x-ai/grok-code-fast-1 model ID instead. OpenRouter adds a small markup but lets one key and one balance cover Grok alongside every other model you test.

The one gotcha that actually costs you: the 256K context ceiling

Here’s the problem you’ll hit on a real project. Grok Code Fast 1’s context window is 256K tokens. That sounds enormous until you point an agent at a large repository and let it read files, run tools, and accumulate a conversation. Agentic loops are context-hungry: each tool result, each file the model reads, each reasoning trace gets appended to the running context. On a big monorepo an Agent-mode session can push past 256K faster than you’d expect, and when it does, the oldest context — often the original instructions or the file you actually care about — gets silently truncated. The model then “forgets” what it was doing mid-task.

The fix is workflow, not config. Keep Grok Code Fast 1 sessions scoped: point it at a directory or a handful of files rather than “the whole repo,” start a fresh task per logical unit of work, and let Cline’s checkpoint feature snapshot progress so a truncated session doesn’t cost you the whole run. If you genuinely need a model to hold a 500K-token codebase in its head for a single reasoning pass, that’s the job for Claude Sonnet 5 or Grok 4.x with their 1M windows — not this model. Grok Code Fast 1 is a scalpel for scoped, high-frequency work, and used that way the small context never bites.

A second, smaller quirk: the API returns visible reasoning traces in its responses. That’s a feature — you can watch how it’s approaching a problem — but those trace tokens count as output tokens, so they show up on your bill. At $1.50/M output they’re cheap, but it’s worth knowing why a “short” answer sometimes reports more completion tokens than the visible text suggests.

The real cost math

Benchmarks are noisy; your invoice isn’t. Take a representative Cline agent task — refactor a module, roughly 1,000 lines changed. A session like that runs somewhere around 60K input tokens (the files it reads, plus tool results and accumulated context) and 12K output tokens (the diffs, explanations, and reasoning traces it writes back). Here’s what that one session costs across current backends:

Backend	Input rate	Output rate	~Cost per session
Grok Code Fast 1	$0.20/M	$1.50/M	~$0.030
DeepSeek V4-Flash	$0.14/M	$0.28/M	~$0.012
Codestral 2	$0.30/M	$0.90/M	~$0.029
GLM 5.2	$1.40/M	$4.40/M	~$0.137
Claude Sonnet 5 (intro)	$2.00/M	$10.00/M	~$0.240
Claude Sonnet 5 (standard, after Aug 31)	$3.00/M	$15.00/M	~$0.360

Grok Code Fast 1 lands at roughly three cents a session — an order of magnitude below Claude Sonnet 5 on the same work. Prompt caching sweetens it further: on repeated runs over the same codebase, cached input drops to $0.02/M (a 90% discount), which matters a lot for agent loops that re-read the same files across many turns. DeepSeek V4-Flash is still cheaper on paper, but Grok’s caching and its tuning specifically for agentic tool-use loops make the two closer in practice than the raw rates suggest.

The honest limit: Grok Code Fast 1’s 70.8% on SWE-bench Verified comes from xAI’s internal harness, and internal-harness numbers tend to run optimistic versus independent, uniform evaluations. Its Artificial Analysis Intelligence Index score of 29 puts it above average but well below frontier reasoning models. So it will chew through routine work fast and cheap, and it will occasionally flail on the genuinely hard multi-step problem where a frontier model would grind through. That’s the trade you’re buying at this price.

Where it fits — and where it doesn’t

Set Grok Code Fast 1 as your everyday Cline backend if you run high volumes of scoped agentic work: generating tests, mechanical refactors, boilerplate, small function generation, tight iterate-and-run loops. At three cents a session and ~90–190 tokens per second of throughput, it feels interactive and the bill barely moves. It’s the right default for cost-capped teams and indie hackers burning tokens all day.

Reach for a frontier model instead when the task needs a 1M-token context in a single pass, involves vision or screenshots (Grok Code Fast 1 is text-only for coding), or is a make-or-break refactor where one wrong edit costs more than the API savings. For that tier, our Claude Sonnet 5 coding review and the GPT-5.6 Sol, Terra, and Luna pricing breakdown cover the current options. And if you’d rather run inference locally and pay $0 per token, our sister site walks through the hardware at runaihome.com’s local AI models by VRAM guide.

FAQ

Is Grok Code Fast 1 still free in Cursor and Cline? No. The free launch period with those editors ran in 2025 and has ended. Today it’s paid via your own xAI key at $0.20/M input and $1.50/M output. Those rates make it cheap, not free — with $25 in promo credits on signup as a cushion.

What’s the exact model ID? grok-code-fast-1 on the xAI API (https://api.x.ai/v1), or x-ai/grok-code-fast-1 on OpenRouter. It’s also aliased internally as grok-build-0.1.

Does BYOK Grok give me Cursor Tab autocomplete? No. Overriding the base URL routes Chat and Agent to xAI, but Cursor’s Tab completion always runs on Cursor’s own models. That’s true for every custom backend, not just Grok.

How does 256K context compare to alternatives? It’s a quarter of the 1M window offered by Claude Sonnet 5 and Grok 4.x. Fine for scoped tasks; a real constraint for whole-repo reasoning in one pass. Scope your sessions and it’s a non-issue.

Is the 70.8% SWE-bench score independently verified? It’s from xAI’s internal harness, not a uniform third-party evaluation, so read it as an upper bound. Independent harnesses typically report lower numbers for most models. Judge it on your own tasks before trusting it with critical work.

Sources

Last updated July 3, 2026. Pricing and features change frequently; verify current state before purchasing.

Was this article helpful?