Jun 10, 2026

Google Colab CLI review 2026: free and cheap GPUs for Claude Code, Codex, and Cursor agents — from your terminal

By AICoderScope Team · 11 min read

claude-codecodexlocal-llmsetup-guidereviewworkflowpricing

TL;DR: Google shipped the Colab CLI on June 5, 2026 — an Apache-2.0 tool that hands T4-through-H100 GPUs and TPU v5e/v6e to any terminal coding agent (Claude Code, Codex, Antigravity) without a browser or Jupyter kernel. It’s the cheapest way to give your agent a GPU if you already pay for Colab, but the auth setup is a real wall and idle sessions silently burn money. Use it for one-shot training and inference jobs, not as a persistent dev box.

	Colab CLI	RunPod	Local GPU
Best for	Bursty GPU jobs from an agent, existing Colab users	Long-running self-hosted models, full root	Privacy, zero per-hour cost
Cost	Free T4 (constrained) → $9.99/mo Pro → $49.99/mo Pro+	~$0.34–$2.99/hr by GPU	Hardware upfront ($500–$3,000+)
Setup friction	High (ADC OAuth scopes)	Medium (SSH key + pod)	High (drivers, VRAM limits)
The catch	Idle sessions leak compute units; Linux/macOS only	You manage the box and the bill	VRAM ceiling caps model size

Honest take: If you already pay $9.99/mo for Colab and want your terminal agent to occasionally grab an A100 for a fine-tune or a batch inference run, the Colab CLI is the lowest-friction option on the market. If you need a GPU running for hours at a stretch, RunPod is cheaper and gives you root. Don’t treat Colab CLI as a replacement for either.

What Google actually shipped

Until June 2026, getting a GPU out of Colab meant opening a browser, spinning up a notebook, and babysitting a kernel. That model never fit terminal-first agents — Claude Code and Codex live in a shell, and they can’t click “Connect” on a web UI.

The Colab CLI fixes that. It’s a command-line client that connects your local terminal to a remote Colab runtime. You provision a GPU or TPU, run a local Python script on it, pull the artifacts back, and tear it down — no Jupyter, no kernel management, no browser tab. Google explicitly designed it for AI agents: the package ships with a COLAB_SKILL.md skill file that tells any agent with shell access how to drive the tool.

This is the first time Colab has shipped a first-party interface built for agent use rather than human notebook editing. The CLI is open source under Apache-2.0, so there’s no CLI-specific cost — you pay only for the accelerator time, billed exactly the way Colab has always billed it.

Install and first run

The package is google-colab-cli. Google recommends uv, but pip works:

# Recommended
uv tool install google-colab-cli

# Or
pip install google-colab-cli

One thing to know before you start: Linux and macOS only. Windows is not supported as of the June 2026 release. If you’re on Windows, run it inside WSL2.

The accelerator menu, as listed in the CLI’s own help, covers more than the headlines suggest:

GPU: T4, L4, G4, H100, A100
TPU: v5e1, v6e1
CPU: the default if you don’t pass a flag

A one-shot job looks like this:

$ colab run --gpu A100 train.py --epochs 3
# provisions a fresh A100 VM, runs train.py, streams output,
# propagates the exit code, then tears the VM down

That colab run pattern — provision, execute, destroy — is the safest way to use the tool, because there’s no VM left running to forget about. For incremental work you’d use a persistent session instead:

$ colab new -s ftune --gpu A100      # allocate a named, billable VM
$ colab exec -s ftune -f prep.py     # kernel state persists between calls
$ colab exec -s ftune -f train.py
$ colab download -s ftune model.safetensors ./out/
$ colab stop -s ftune                # YOU must do this — see below

The full command surface is broad for a v1: colab sessions lists active VMs, colab status shows hardware and metadata, colab restart-kernel clears state, colab url --open hands you a browser connection if you want one, and colab upload/download/rm/ls/edit handle files. There’s also colab repl and colab console for interactive use — but those are traps for agents (more below).

The authentication wall (where most people stall)

This is the part the launch-day blog posts gloss over and the part that actually stops you. The CLI authenticates with ADC (Application Default Credentials) by default; OAuth2 is the alternative via the global --auth {oauth2,adc} flag.

The catch is buried in the COLAB_SKILL.md warnings: ADC needs four specific OAuth scopes, including the colaboratory scope for keep-alive. Miss that scope and your VM gets silently unassigned mid-run — no error, just a dead session. Before you provision anything, verify your identity:

$ colab whoami
# confirms you're authenticated with the right scopes

If whoami looks wrong, fix auth before burning compute. An agent that skips this step will provision a VM, lose it silently, and report a confusing failure. This single issue is the most common reason a first run fails.

Session metadata lives in ~/.config/colab-cli/sessions.json, and you can isolate parallel agent runs with --config <path> so two agents don’t stomp on each other’s session names.

Wiring it into Claude Code, Codex, and Cursor

Because the CLI is just a binary on your PATH, any agent that can run shell commands can use it — Claude Code, Codex, Antigravity, or Cursor’s agent mode in a terminal. The COLAB_SKILL.md file is what makes the difference between an agent that fumbles and one that drives it cleanly. Drop it where your agent reads skills (for Claude Code, reference it from your CLAUDE.md; for Codex, from your agent instructions) and the agent inherits the right command patterns and guardrails.

The skill file defines three workflows worth internalizing:

One-shot jobs — colab run script.py provisions, runs, and tears down with proper exit-code propagation. Best default for agents because nothing leaks.
Persistent sessions — colab new -s <name> keeps kernel state alive across multiple colab exec calls. Use this for iterative work where re-loading a model each time would waste minutes.
Script execution into a live kernel — colab exec -s <name> -f script.py sends code to a running session and can export notebook results to *_output.ipynb.

A realistic Claude Code prompt that uses it: “Run embeddings.py on a Colab T4 and download the resulting index.faiss to ./data/.” The agent translates that to a colab run --gpu T4 call, waits for the exit code, and pulls the artifact — all without you touching a browser.

Real costs: what you’ll actually pay

There’s no charge for the CLI itself. You pay for accelerator time through Colab’s existing compute-unit (CU) model, verified against Colab’s pricing on June 10, 2026:

Plan	Price	What you get
Free	$0	Constrained T4 access, same as the browser free tier
Pay As You Go	$9.99 / 100 CU	No subscription; CU pool you draw down
Pro	$9.99 / mo	Higher burst quotas, priority access
Pro+	$49.99 / mo	Highest quotas, background execution, longer runs

The number that matters for budgeting is the burn rate. A T4 burns ~1.76 CU/hr, so 100 CU buys roughly 57 hours on a T4. An A100 burns ~15 CU/hr — that same 100 CU is gone in about 7 hours. H100 time goes faster still.

That math reframes the whole tool. For light embedding or inference work on a T4, $9.99 stretches across a week of casual use. For A100/H100 fine-tuning, you’ll chew through a Pro+ allotment quickly, and at that point a dedicated RunPod instance or your own card starts looking cheaper per hour. The Colab CLI’s sweet spot is bursty, short jobs an agent kicks off and forgets — not a GPU you keep warm all day.

The problem I hit: a leaked A100 session

The single most expensive mistake with this tool is leaving a session running. The skill file says it bluntly: “Always colab stop -s <name> when done — idle VMs burn compute units.” An idle A100 doesn’t pause when your agent finishes thinking; it keeps drawing ~15 CU/hr until you stop it.

I reproduced this on purpose. I ran colab new -s test --gpu A100, did one exec, then walked away simulating an agent that crashed before its cleanup step. Forty minutes later the VM was still live and had eaten roughly 10 CU doing nothing.

The fix is to make teardown non-optional:

# Don't trust the agent to clean up. Trap it.
colab new -s job --gpu A100
trap 'colab stop -s job' EXIT
colab exec -s job -f train.py
# stop fires even if the script crashes

Better still, prefer colab run for anything that doesn’t need a persistent kernel — it tears the VM down automatically. Reserve named sessions for genuinely iterative work, and run colab sessions at the end of every working day to catch orphans.

A second gotcha from the skill file: an unrecognized --gpu value silently defaults to A100, which often fails to allocate on lower tiers. If your agent passes a typo’d accelerator name, you’ll get a confusing allocation error rather than a clean “no such GPU.” Validate the flag, and test new pipelines on a T4 or CPU before reaching for the expensive hardware.

Where it breaks

No tool is free of limits, and the Colab CLI has a few sharp ones:

Interactive commands hang for agents. Never let an agent run colab repl, colab console, colab auth, or colab drivemount non-interactively — they wait for terminal input and freeze. These are human-only commands.
No Windows support. WSL2 is the only path on Windows.
Free-tier GPUs are constrained and preemptible. Free T4 access through the CLI is the same throttled experience as the browser — fine for testing, unreliable for anything timed.
It’s still Colab underneath. Sessions can be reclaimed under load, regional availability for A100/H100 varies, and you don’t get root the way you would on a rented box.

When to use it (and when not to)

Reach for the Colab CLI when you already pay for Colab and want your terminal agent to occasionally grab a GPU for a discrete job — generating embeddings, running a batch inference pass, a short fine-tune, or testing whether a model even loads. The integration with Claude Code and Codex is genuinely the smoothest first-party agent-to-GPU bridge available right now.

Skip it if you need a GPU running for hours at a stretch, want full root access, or care about keeping data off Google’s infrastructure. For sustained workloads, a dedicated RunPod instance is more predictable per hour. For privacy-first or always-on local inference, owning the hardware wins — and if you’re sizing a card for that, our sister site has the deep hardware analysis in the local AI GPU buying guide and a breakdown of which local models fit which VRAM.

For the broader question of running models on your own machine through your editor, see our guides on Cursor + Ollama and LM Studio and Cline + LM Studio — different answers to the same “I want GPU compute under my agent” problem.

FAQ

Is the Colab CLI free? The CLI itself is free and open source (Apache-2.0). You pay only for accelerator time through Colab’s normal compute-unit pricing. Free-tier users get constrained T4 access; paid plans start at $9.99/mo.

Which agents work with it? Any agent that can run shell commands — Claude Code, OpenAI Codex, Google Antigravity, and Cursor’s terminal agent mode. The bundled COLAB_SKILL.md gives agents the command patterns and safety rules.

What GPUs and TPUs can it provision? GPUs: T4, L4, G4, A100, and H100. TPUs: v5e1 and v6e1. CPU is the default if you don’t pass an accelerator flag. Actual availability depends on your plan and region.

Why does my session keep dying? Almost always an auth-scope problem. ADC needs the colaboratory OAuth scope for keep-alive — without it, VMs get silently unassigned. Run colab whoami to confirm your scopes before provisioning.

Does it run on Windows? Not natively as of the June 2026 release. Use WSL2.

How do I avoid surprise bills? Prefer colab run (auto-teardown) over persistent sessions, wrap named sessions in a shell trap that calls colab stop, and run colab sessions daily to catch orphaned VMs. An idle A100 burns ~15 CU/hr doing nothing.

Sources

Last updated June 10, 2026. Pricing and features change frequently; verify current state on the official Colab pricing page before purchasing.

Was this article helpful?