AI Coding Tools on Windows AI PCs in 2026: Cursor, Claude Code, and Copilot on RTX Spark and Copilot+ Devices

local-llmcursorclaude-codecopilotwindowsrtx-sparkhardwarereview

TL;DR: A Copilot+ PC’s 40-TOPS NPU does not run your coding agent — Recall and Live Captions, yes; Cursor’s agent or Claude Code, no. The first Windows machine that genuinely runs a 120B coding model on-device is NVIDIA’s RTX Spark, announced at Computex 2026, but it ships fall 2026 at unannounced prices. Until then, the cloud still wins for most developers.

Copilot+ PC (NPU)RTX Spark (fall 2026)Cloud coding (any laptop)
Best forOffice AI, on-device transcriptionPrivacy-first local 120B agentsAlmost everyone, today
Runs a real coding agent locally?NoYes — 120B, up to 1M contextCloud-side, instant
Cost$900+ laptop you’d buy anywayDGX Spark proxy $3,999–$4,699; RTX Spark TBD$20/mo (Cursor/Claude Code)
The catchNPU is for OS features, not agentsUnreleased; ~27–44 tok/s on coding modelsCode leaves your machine

Honest take: Don’t buy a Copilot+ laptop expecting it to run Cursor or Claude Code offline — that’s not what the NPU is for. If on-device coding matters to you, wait for an RTX Spark machine in hand and judge it on benchmarks, not the 1-petaflop marketing line. For everyone else, a $20/month cloud agent on whatever laptop you own is still faster and cheaper.

The confusion this article exists to clear up

Every Computex cycle produces a wave of “the agentic PC is here” headlines, and June 2026 was the loudest yet. NVIDIA and Microsoft used the show to pitch Windows as an “agentic AI platform,” and the natural reader question followed: if I buy one of these AI PCs, can I finally run my coding agent locally and stop paying $20 a month?

The answer depends entirely on which “AI PC” you mean, because two very different products are being marketed with overlapping language. A Copilot+ PC is a mainstream laptop with a neural processing unit. An RTX Spark machine is a Grace Blackwell superchip with 128GB of unified memory. They are not the same class of device, and only one of them runs a coding model worth the name. Conflating them is the single most expensive mistake a developer can make at the checkout page this year.

What a Copilot+ PC’s NPU actually does (and doesn’t)

To carry the Copilot+ badge, a PC needs an NPU rated at 40+ TOPS, plus 16GB RAM, a 256GB SSD, and Windows 11 24H2 or newer. Current qualifying silicon: Intel’s Core Ultra 200V at 48 TOPS, AMD’s Ryzen AI 300 series at 50 TOPS, and Qualcomm’s Snapdragon X Elite at 45 TOPS.

Here’s the part the marketing blurs. That NPU exists to run a specific, fixed set of Windows features on-device — Recall, Cocreator image generation in Paint, Auto Super Resolution, Live Captions with real-time translation in 40+ languages, and Windows Studio Effects. These are small, heavily optimized models tuned for the NPU’s quantized integer math and tight power budget.

A coding agent is a completely different workload. Cursor’s agent mode, GitHub Copilot, and Claude Code all run their reasoning on large cloud models (Claude, GPT-5.5, Gemini) over the network. None of them dispatch work to your NPU. There is no toggle that points Cursor at the 48 TOPS in your Core Ultra chip, and there won’t be — the NPU isn’t a general-purpose LLM accelerator, and 40 TOPS is roughly two orders of magnitude short of what a 30B coding model needs for usable throughput.

So a Copilot+ PC changes nothing about your coding setup. You still install Cursor or VS Code, you still pay for a cloud subscription or wire up a remote model, and the work still leaves your machine. The NPU sits there doing Recall. If someone tells you a Copilot+ laptop “runs AI coding locally,” they’re confusing the OS-feature NPU with the GPU-class compute a coding model actually requires.

What RTX Spark changes — the genuinely new part

RTX Spark is the device that makes the local-coding pitch real. Announced at Computex 2026 as a NVIDIA-and-Microsoft Windows-on-Arm platform, the RTX Spark Superchip pairs up to 20 Arm CPU cores with a Blackwell GPU carrying 6,144 CUDA cores, up to 128GB of LPDDR5X unified memory, and up to 300 GB/s of memory bandwidth on a single 3nm package. NVIDIA rates it at 1 petaflop of AI performance (FP4) and says it runs 120-billion-parameter LLMs with up to 1 million tokens of context locally — alongside the partnership’s OpenShell agent framework and a new set of Windows security primitives that gate which tools and data a local agent can touch.

The 128GB unified pool is the whole story. A coding model’s size, plus its KV cache for long context, has to fit in memory or throughput collapses. A normal gaming laptop with 8–16GB of VRAM cannot hold a 120B model; it spills to system RAM and crawls. RTX Spark treats CPU and GPU memory as one 128GB pool, so a 120B model in 4-bit and a large context window fit at once. That is the architectural reason RTX Spark can do what a Copilot+ laptop — or even an RTX 4090 desktop — cannot.

Two cautions before you get excited. First, RTX Spark ships fall 2026 (over 30 laptops and roughly 10 desktops from OEMs like ASUS), and consumer pricing is unannounced. The enterprise DGX Spark — same GB10 architecture, available now — runs $3,999 to $4,699, so don’t expect a bargain. Second, “1 petaflop” is an FP4 peak, not coding throughput. The number that matters is tokens per second on a real coding model, and we can already measure that.

The token-per-second reality (measured, not marketed)

Because RTX Spark and the shipping DGX Spark share the GB10 architecture and the 128GB unified-memory design, DGX Spark benchmarks are the best available proxy for what RTX Spark will deliver on a coding model. From the public llama.cpp benchmark thread on DGX Spark hardware:

Model (quant)Prompt processingToken generationAt 32K context
gpt-oss-120b (MXFP4)~2,000 tok/s~35 tok/s~28 tok/s
Qwen3-Coder 30B (Q8_0)~1,600 tok/s~44 tok/s~27 tok/s
gpt-oss-20b (MXFP4)~2,000 tok/s~60 tok/s

Switching inference engines moves the numbers a little — vLLM with MXFP4 reportedly pushes gpt-oss-120b to about 59 tok/s on a single node — but the shape holds: 27 to 44 tokens per second for a usable coding model at real context. That’s genuinely workable for agentic editing where you’re reading a few files and applying diffs. It is not Claude Sonnet over the API, which streams several times faster and reasons with a far larger model. Long context bites hardest: every model above loses a third of its generation speed climbing from short prompts to 32K tokens, exactly when a coding agent is loading your repo.

So the honest local-coding promise for RTX Spark is: a private, offline 120B agent at roughly half the perceived speed of a cloud subscription, on hardware that costs as much as four years of Cursor Pro. For privacy-bound work or air-gapped environments, that trade is worth it. For raw throughput per dollar, the cloud still wins comfortably.

Which coding tools run where on Windows

Even once you have the hardware, there’s a software-compatibility layer to clear, because RTX Spark is Windows on Arm. Arm-native apps run natively; x86/x64 apps run under the Prism emulator introduced in Windows 11 24H2.

  • VS Code and Visual Studio are Arm-native — these are the smoothest path.
  • Cursor is a VS Code fork; it runs, but check for an Arm-native build versus emulated x64, which costs battery and some responsiveness.
  • Claude Code offers an ARM64 installer, but developers have reported it still running x64 binaries under emulation, with crash bugs on Windows ARM64 (an ACCESS_VIOLATION issue tracked on GitHub). Treat Arm support as improving but not yet bulletproof.
  • The local model server (Ollama, llama.cpp, LM Studio, vLLM) is what taps the Blackwell GPU. This is the piece that matters on RTX Spark, and NVIDIA’s stack targets it directly.

The practical pattern: run an Arm-native editor, point it at a local OpenAI-compatible endpoint served by Ollama or LM Studio on the RTX Spark GPU, and keep an eye on which of your tools are emulated. If you’ve wired a local backend into Cursor or Cline before, the mechanics are identical — see our Cursor + Ollama and LM Studio setup and Cline + LM Studio guides; only the hardware underneath changes.

Where this breaks

  • Price-to-value is brutal. RTX Spark pricing is unannounced, but the DGX Spark proxy is $3,999–$4,699. At $20/month, Cursor Pro or Claude Code costs that much over 15 to 20 years. You buy this hardware for privacy, offline capability, or unmetered token volume — not to save money.
  • “AI PC” marketing oversells the mainstream tier. A sub-$1,500 Copilot+ laptop will not run a coding agent locally, full stop. Only the Spark-class unified-memory machines do.
  • Throughput is half the cloud, and context makes it worse. 27 tok/s at 32K context is fine for targeted edits and painful for a long agentic session.
  • Windows-on-Arm friction is real today. Some of your toolchain runs emulated. Verify your exact stack before committing.

If your interest is squeezing local models out of hardware you already own, an RTX desktop with WSL is the cheaper experiment — see WSL 3 for AI coding on Windows and Cursor + local Llama hardware tiers. For the deeper hardware analysis of Spark-class machines and unified-memory token economics, our sister site runaihome.com covers the silicon side.

Buy, wait, or skip

Skip the “AI PC for coding” idea entirely if you mean a Copilot+ laptop. Buy that laptop for its battery, screen, and the OS AI features — not to run agents. Your coding setup is a cloud subscription regardless.

Wait if you want a true local coding box. RTX Spark hardware lands in fall 2026 with real independent benchmarks to follow. Judge it then on tokens per second and emulation status, not the petaflop headline.

Buy a DGX Spark now only if you have a concrete air-gapped or privacy mandate, the budget is approved, and you’ve accepted ~30–44 tok/s as your ceiling on coding models. Otherwise the cloud is faster, cheaper, and ready today.

FAQ

Can a Copilot+ PC run Cursor or Claude Code offline? No. The 40-TOPS NPU runs Windows features like Recall and Live Captions, not LLM coding agents. Those tools use cloud models over the network or a separate local GPU server.

Does RTX Spark actually run a 120B coding model locally? Yes — the 128GB unified memory is what makes it possible. Expect roughly 27–44 tokens/second on a usable coding model at real context, based on shipping DGX Spark benchmarks of the same architecture.

Is RTX Spark cheaper than paying for Cursor? No. With the DGX Spark proxy at $3,999–$4,699 and a $20/month cloud subscription, the hardware pays back only after 15+ years. You buy it for privacy and offline use, not savings.

Will my coding tools run on Windows on Arm? VS Code and Visual Studio are Arm-native. Cursor (a VS Code fork) runs, possibly emulated. Claude Code has an ARM64 installer but has reported crash bugs under emulation. Verify your stack before buying.

When does RTX Spark ship? Fall 2026, per NVIDIA’s Computex 2026 announcement — over 30 laptops and about 10 desktops from OEMs. Consumer pricing was not announced.

Sources

Last updated June 18, 2026. Pricing, hardware specs, and ship dates change frequently; RTX Spark consumer pricing was unannounced at publication. Verify current state before purchasing.

Was this article helpful?