Jun 2, 2026

OpenCode + Ollama in 2026: the setup that works (and the file-write bug you need to know about)

By AICoderScope Team · 14 min read

opencodeollamalocal-llmsetup-guideterminalprivacycoding-agentself-hosted

TL;DR: OpenCode v1.15.13 connects to Ollama via an OpenAI-compatible custom provider in five config lines. Code exploration and read-only analysis in Plan mode work reliably with a 14B+ model. Build mode file writes are broken for every local model tested to date — open bug #29940 means the write tool schema causes token truncation before the file path is ever generated. Use Ollama for exploration; keep a cloud API key for the moment you need actual files written.

	OpenCode + Ollama (local)	OpenCode + Claude Sonnet	OpenCode + BYOK Anthropic
Best for	Codebase exploration, read-only analysis	Daily coding tasks, file writes	Cost-controlled full agent work
Monthly cost	$0 (hardware only)	~$5–$15 API depending on usage	Same — pay-per-token
File writes	Broken (open bug #29940)	Works reliably	Works reliably
Privacy	Fully local, nothing leaves machine	Prompts to Anthropic	Prompts to Anthropic

Honest take: Install Ollama for Plan mode exploration — it’s fast, free, and genuinely useful for understanding unfamiliar codebases. Don’t expect to replace a cloud agent for actual coding tasks until the schema bug ships a fix.

What you’re actually building

OpenCode is a terminal-first AI coding agent — a Go binary, a polished TUI, and an agent loop that can read files, write files, run shell commands, and answer questions about your code. It’s open-source under MIT, version v1.15.13 as of May 30, 2026.

Ollama runs large language models locally. Version 0.30.0 ships an OpenAI-compatible REST API at http://localhost:11434/v1, which means any tool that knows how to speak to OpenAI can speak to Ollama instead.

OpenCode uses the Vercel AI SDK under the hood. It supports a @ai-sdk/openai-compatible provider type for exactly this use case — pointing at any OpenAI-compatible endpoint. Ollama is not a first-class bundled provider in OpenCode’s source, but it works through this compatibility layer.

The combination: a free, fully local AI coding agent. That’s the pitch. The reality has one sharp edge, documented below.

Hardware floor for local inference

Before configuring anything, map your hardware to realistic model choices.

GPU VRAM	Practical Ollama model	OpenCode use case
6–8 GB (RTX 4060)	qwen2.5-coder:7b Q4	Plan mode only — file writes fail
10–12 GB (RTX 3060 12GB)	qwen2.5-coder:14b Q4	Plan mode reliable, Build mode unreliable
16 GB (RTX 4060 Ti 16GB)	qwen2.5-coder:14b Q5	Same as above, better quality
24 GB (RTX 3090 / RTX 4090)	qwen2.5-coder:32b Q4	Best local tier; Build mode may work for small files
32+ GB RAM CPU-only	qwen2.5-coder:14b Q4 (slow)	Plan mode at 2–4 tok/s, acceptable for exploration

CPU-only inference with Ollama is viable for Plan mode since you’re not blocked on low latency — a 2-second response time for a code question is acceptable. It is not viable for Build mode even if the file write bug were fixed; the slowness makes an agent loop unusable.

For deeper guidance on hardware tiers for local LLM inference, see runaihome.com’s local AI hardware guide.

Step 1: Install Ollama 0.30.0 and pull a coding model

# macOS / Linux one-liner
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version
# ollama version is 0.30.0

On Windows, download the installer from ollama.com. The 0.30.0 release adds improved compatibility with NVIDIA hardware and faster model loading via an updated llama.cpp backend.

Pull the model you’ll use. Pick based on your VRAM:

# 8 GB VRAM — fast, hits the file-write bug reliably
ollama pull qwen2.5-coder:7b

# 12–16 GB VRAM — best balance for Plan mode
ollama pull qwen2.5-coder:14b

# 24 GB VRAM — highest local quality
ollama pull qwen2.5-coder:32b

Confirm Ollama’s OpenAI-compatible endpoint is running:

curl http://localhost:11434/v1/models
# Expected: {"object":"list","data":[{"id":"qwen2.5-coder:14b",...}]}

If you get Connection refused, run ollama serve in a separate terminal first. On macOS the menu bar app starts it automatically at login; on Linux you may need systemctl enable ollama.

Step 2: Install OpenCode v1.15.13

# Quick install
curl -fsSL https://opencode.ai/install | bash

# Or via npm
npm install -g opencode-ai@latest

# macOS / Linux via Homebrew
brew install anomalyco/tap/opencode

# Verify
opencode --version
# opencode 1.15.13

OpenCode installs to $HOME/.opencode/bin by default. Add that to your PATH if the shell doesn’t pick it up automatically:

echo 'export PATH="$HOME/.opencode/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Step 3: Configure OpenCode to use Ollama

The config file lives at the project root as opencode.json (preferred for per-project settings) or in a global location that follows your OS conventions. Create opencode.json in your project directory:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen2.5-coder:14b",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:14b": {
          "name": "qwen2.5-coder:14b"
        }
      }
    }
  }
}

The model field uses the format "provider/model" — the provider key (ollama) must match the key in the provider object. The model name inside the models map must exactly match the tag you pulled with ollama pull.

To add multiple models (useful for switching between a fast 7B for quick lookups and a 14B for heavier analysis):

{
  "$schema": "https://opencode.ai/config.json",
  "model": "ollama/qwen2.5-coder:14b",
  "small_model": "ollama/qwen2.5-coder:7b",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:14b": {
          "name": "qwen2.5-coder:14b"
        },
        "qwen2.5-coder:7b": {
          "name": "qwen2.5-coder:7b"
        }
      }
    }
  }
}

small_model is used by OpenCode for lightweight auxiliary tasks like session title generation — routing those to the 7B keeps token burn low without sacrificing quality on the main task.

Step 4: Launch and verify

cd /path/to/your/project
opencode

OpenCode opens its TUI. If the config is valid, the model selector in the bottom status bar shows qwen2.5-coder:14b (ollama). If it shows a different model or prompts for an API key, the opencode.json isn’t being picked up — check that the file is in the directory where you launched OpenCode.

Press Tab to switch between build (full agent access) and plan (read-only) modes. For local models, start in plan mode.

Ask it something about your codebase:

> explain the authentication flow in this codebase

Expected: the model reads your files, traces the auth code, and returns a clear explanation. With qwen2.5-coder:14b on a 12 GB VRAM GPU, response time is typically 3–8 seconds for a file-reading task. Acceptable for exploration work.

What works well: Plan mode code exploration

Plan mode is where OpenCode + Ollama genuinely earns its place. The agent reads files, follows call chains, answers architecture questions, and drafts step-by-step plans for you to review before any code changes. None of this writes to disk.

Real use cases that work reliably:

Onboarding to an unfamiliar repo. “Walk me through how this Express app handles JWT refresh tokens.” The agent traces the imports, reads the middleware, and explains the flow in context.
Pre-implementation planning. “I need to add rate limiting to the API. What’s the minimal set of changes, and what could break?” The agent reads the routing code, identifies touch points, and produces a numbered plan.
Reviewing your own code before a PR. “Does this function have any edge cases I haven’t handled?” Much faster than reading it yourself for the fifteenth time.
Dependency analysis. “Which files import the UserService and what do they use from it?” OpenCode finds every import and summarizes usage patterns.

For all of these, the model never needs to write a file. The plan mode constraint is the feature, not a limitation.

What breaks: Build mode file writes

Switch to Build mode and ask the agent to create or modify a file. This is where the setup stops being reliable.

Two documented bugs affect every combination of local model + Ollama tested to date:

Bug #29757 — JSON output instead of file creation (affects 7B and smaller models)

When using qwen2.5-coder:7b with OpenCode in Build mode, requesting a file write returns the raw JSON of the write tool call rather than executing it. You see something like:

{
  "tool": "write",
  "input": {
    "content": "# my_script.py\n...",
    "filePath": "my_script.py"
  }
}

…in the chat window. The file is never created. The root cause: 7B coder models lack reliable structured tool-call execution. They can describe what they would do but don’t consistently follow through on the tool invocation protocol.

Bug #29940 — Schema truncation on large files (affects all local models)

OpenCode’s write tool schema currently places the content field before filePath in the JSON structure. When a local model generates a large file, it exhausts its output token budget on content before ever reaching filePath. OpenCode parses the response, finds a missing required field, and throws:

SchemaError: Missing key at ["filePath"]

The fix is known — reorder the fields so filePath comes first — but as of v1.15.13, issue #29940 remains open and unfixed. This affects every local model regardless of size. A confirmed reproduction: gemma4:26b (a 26 billion parameter model) fails on Build mode file writes via Ollama (issue #29996). This is not a model quality problem; it’s a schema design bug in OpenCode itself.

Practical consequence: Build mode with local Ollama models is not reliable for production use. Plan mode works. Build mode does not.

The hybrid setup that works today

The practical solution: use Ollama for Plan mode exploration (free, private), add a cloud API key for Build mode when you need files written.

{
  "$schema": "https://opencode.ai/config.json",
  "model": "anthropic/claude-sonnet-4-5",
  "small_model": "ollama/qwen2.5-coder:7b",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen2.5-coder:7b": {
          "name": "qwen2.5-coder:7b"
        }
      }
    }
  }
}

Set your Anthropic key:

export ANTHROPIC_API_KEY="sk-ant-..."

Workflow: open OpenCode, start in Plan mode with the local model for exploration and planning. When the plan is solid and you’re ready to execute, switch the active model to anthropic/claude-sonnet-4-5 using OpenCode’s model selector (; key), flip to Build mode, and run the actual implementation. The local model handled the free exploration; the cloud model handles the writes.

This isn’t “fully local” — but it is “exploration is free and private, writes cost tokens only when you actually need them.” For a codebase you’re regularly working in, the Plan mode exploration is a meaningful portion of the total token spend.

For completely local file writes, track issue #29940. When it closes, re-test with a 32B model on adequate hardware. Based on the schema fix being straightforward (field reordering), a patch is likely in a future minor release.

Model selection: minimizing failures

If you need Build mode with local models right now and are willing to accept unreliable results, the guidance from reported issues:

Model	Build mode file writes	Notes
qwen2.5-coder:7b	Broken — JSON output bug	Issue #29757
deepseek-r1:8b	Broken — write failures	Issue #29996 config
gemma4:26b	Broken — displays in chat	Issue #29996 confirmed
qwen2.5-coder:14b	Partially broken — small files only	Schema truncation on large files
qwen2.5-coder:32b	Untested in issues	May work for small files; schema bug still applies

The open question is whether 32B models produce clean enough tool calls to work around the schema truncation bug. The bug is in OpenCode’s schema, not the model — any model can trigger it when the output file is large enough. For short scripts (<50 lines), the truncation is less likely to occur because content is short enough not to exhaust the token budget.

For the cross-site context on what hardware fits which model: Local LLM for Code: Which Model on RTX 5060 Ti Wins in 2026? has fresh numbers on qwen2.5-coder inference throughput.

Troubleshooting

“Model not found” on startup

The model name in your config must exactly match the Ollama tag. Run ollama list to see what’s pulled:

ollama list
# NAME                         ID            SIZE    MODIFIED
# qwen2.5-coder:14b            ...           9.0 GB  ...

If the list shows qwen2.5-coder:14b but your config says qwen2.5-coder-14b (hyphen instead of colon), it won’t find it.

“Connection refused” at http://localhost:11434/v1

Ollama isn’t running. On Linux, start it manually:

ollama serve &

On macOS with the menu bar app, click the icon → Start. On Windows, check that the Ollama tray icon is present.

Build mode shows JSON instead of creating files

This is bug #29757 — the model is returning tool call JSON rather than executing it. If you’re on a 7B model, this is expected behavior. Switch to Plan mode or use a cloud model.

SchemaError: Missing key at [“filePath”]

This is bug #29940. The file you’re generating is large enough to hit the token budget before the model outputs filePath. Workarounds: ask for smaller files (one function at a time), or switch to a cloud model for Build mode.

Where OpenCode + Ollama sits relative to the alternatives

OpenCode isn’t the only option for terminal-based local AI coding. Two direct comparisons:

Aider + Ollama is more mature for local model use. Aider’s tool protocol is simpler (git-diff-based edits rather than JSON tool calls), which means it handles local models more robustly at smaller sizes. If your primary need is autonomous coding with local models — actually writing code, not just exploring it — Aider + Ollama is the better-tested path today.

Cline + Ollama (VS Code) works for coding tasks in VS Code with local models if you size the model correctly (14B minimum for reliable agent work). The Cline local LLM guide covers the exact setup and .clinerules configuration.

OpenCode’s advantage over both: the Plan mode / Build mode separation is a cleaner interface for codebase exploration, and the TUI is more ergonomic than Aider’s output if you’re doing a lot of architectural Q&A alongside the coding work.

FAQ

Does OpenCode have native Ollama support or is this a workaround?

It’s configured as a custom OpenAI-compatible provider — Ollama is not in OpenCode’s bundled provider list. The configuration is supported and used by real users, but “native” would mean first-class integration with auto-detection of running Ollama instances and model listing. That doesn’t exist yet.

Will the file write bug be fixed?

Issue #29940 is open and the fix is well-defined (reorder schema fields). OpenCode ships releases frequently (814 releases as of v1.15.13). Watch the issue for a fix; subscribe to the repo’s releases if you want a notification.

Can I use Ollama running on a different machine?

Yes. Replace http://localhost:11434/v1 with http://<your-server-ip>:11434/v1 in the baseURL field. Make sure Ollama is bound to 0.0.0.0 on the server:

OLLAMA_HOST=0.0.0.0 ollama serve

What context window does Ollama set for qwen2.5-coder:14b?

Ollama defaults to 2048 tokens context for most models regardless of the model’s capability. To increase it, set the num_ctx option when running the model:

ollama run qwen2.5-coder:14b --num_ctx 32768

Or create a custom Modelfile:

FROM qwen2.5-coder:14b
PARAMETER num_ctx 32768

For codebase exploration tasks, 32K context makes a meaningful difference — you can include more files in a single prompt.

Is this private? Does Ollama send anything off my machine?

With local Ollama (no cloud Ollama subscription), inference runs entirely on your machine. OpenCode itself collects no telemetry by default. Your prompts, file contents, and model outputs stay local.

Sources

OpenCode GitHub repository — sst/opencode — version v1.15.13, May 30, 2026
Ollama GitHub repository — ollama/ollama — version 0.30.0, May 13, 2026
OpenCode issue #29996 — Unable to generate files when using local Ollama — confirmed gemma4:26b + deepseek-r1:8b failures
OpenCode issue #29940 — Write tool schema field ordering causes truncation failures with local models — schema bug root cause
OpenCode issue #29757 — OpenCode + Qwen gives JSON output instead of writing to disk — qwen2.5-coder:7b JSON output bug
Ollama REST API documentation — OpenAI-compatible endpoint at /v1/chat/completions
OpenCode review 2026 — AICoderScope — full product overview

Last updated June 2, 2026. OpenCode releases frequently; check the GitHub releases for patch notes after v1.15.13.

Recommended Gear

RTX 4060 (8 GB VRAM) — entry-level local inference for Plan mode
RTX 3060 12GB — 12 GB hits qwen2.5-coder:14b Q4 comfortably
RTX 4060 Ti 16GB — 16 GB lets you push Q5 quantizations on 14B models
RTX 3090 (24 GB) — runs qwen2.5-coder:32b Q4 fully in VRAM
RTX 4090 (24 GB) — fastest single-GPU option for local coding inference

Was this article helpful?