Jun 8, 2026

OpenHands 1.7.0 + Ollama in 2026: complete local setup, the Docker networking trap, and which models actually complete agentic tasks

By AICoderScope Team · 12 min read

openhandsollamalocal-llmsetup-guidedockerdevstralqwenself-hostedprivacy

TL;DR: OpenHands 1.7.0 + Ollama gives you a fully offline autonomous coding agent — no API key, no cloud, no data leaving your machine. The setup takes about 30 minutes if you know the two traps: Docker’s host.docker.internal doesn’t resolve on Linux by default, and the sandbox runtime image tag must exactly match the OpenHands image version. Miss either and the container fails silently or connects to nothing.

What you’ll be able to do after this guide:

Run OpenHands autonomously against your local codebase with zero API cost
Give it a GitHub issue URL and watch it write the fix, run tests, and commit the result — all on your machine
Know which local models actually complete multi-file tasks vs. which ones stall out

	OpenHands + Ollama (local)	OpenHands Cloud Free	OpenHands Cloud Pro
Best for	Privacy-first teams, air-gapped infra, zero ongoing cost	Evaluation, 10 tasks/day	Daily driver with managed sandbox
Price / Cost	$0 after hardware	$0, BYOK at-cost	$20/mo + LLM at-cost
Model quality	Devstral Small 2: ~46.8% SWE-bench on OpenHands	Your key, your rate	Same
The catch	Requires GPU with ≥15 GB VRAM; Linux needs —add-host flag	10 conv/day hard limit	LLM bill on top

Honest take: If you have an RTX 4090 or equivalent sitting idle, the local path makes more sense than paying Claude API rates every time OpenHands spins up a multi-step task. On smaller hardware, Cloud Free is the smarter evaluation path before committing to the Docker setup.

Why local OpenHands

The OpenHands review covers the full architecture. The short version relevant to this guide: OpenHands is an autonomous coding agent backed by whichever LLM you configure. Swap the LLM and the agent behavior changes substantially.

Running locally means:

Zero data exposure — no code, no prompts, no environment variables reach an external server
No per-task API bill — model inference is just GPU utilization on your own machine
Freedom to use models that aren’t available via commercial APIs (Apache 2.0-licensed Devstral Small 2 for example)

The trade-off is raw task completion capability. A frontier cloud model scores roughly 72% on SWE-bench when used as the OpenHands backend. The best local option as of June 2026, Devstral Small 2 (24B), scores approximately 46.8% on the same benchmark when driven by OpenHands. That 25-point gap is real — local inference misses roughly 70 more bug fixes per 277 test cases. For most real-world tasks (single-file fixes, feature additions, test generation), the local path still delivers. For the hardest 10% of issues — cross-cutting refactors, subtle invariant bugs, multi-service changes — the gap shows.

Hardware floor

Model size determines what you can run. OpenHands needs a model with solid tool-calling support — it issues file read/write calls, terminal commands, and browser interactions as structured tool invocations. Models without reliable JSON tool calling stall immediately.

Model	Ollama tag	VRAM (Q4_K_M)	Tool calling	Task quality
Devstral Small 2	`devstral-small-2`	~15 GB	Native, MoE-trained	Best local option; ~46.8% SWE-bench via OpenHands
qwen2.5-coder:32b	`qwen2.5-coder:32b`	~20 GB	Native	Strong; slightly behind Devstral on agentic multi-step
qwen2.5-coder:14b	`qwen2.5-coder:14b`	~9 GB	Native	Budget floor; completes single-file tasks reliably
Older llama3 / codellama	various	varies	None or unreliable	Not recommended — stalls on first tool call

Devstral Small 2 runs at Q4_K_M on 15 GB of VRAM, fitting inside any 24 GB card (RTX 4090, RTX 3090). For context: its 256k token window means you can feed an entire Python package without truncating. If your card tops out at 12–16 GB, use qwen2.5-coder:14b as your starting point — it completes simple one-file tasks and is a realistic daily evaluation model before upgrading hardware.

For a full breakdown of which models fit which GPU tier, the runaihome.com guide at Best Local AI Models by VRAM is worth checking before buying hardware for this setup.

Step 1: Install Ollama and pull a coding model

Current version: Ollama 0.30.6.

Linux:

curl -fsSL https://ollama.com/install.sh | sh

macOS / Windows: Download from ollama.com/download.

Before pulling a model, set the context window. OpenHands agentic loops consume 20k–50k tokens per task. Ollama’s default 2k context silently truncates input — same trap as in Aider + Ollama setups and Continue.dev. Fix it before you start:

# Add to ~/.bashrc or ~/.zshrc
export OLLAMA_NUM_CTX=65536

Restart the Ollama service after setting the variable:

# Linux
sudo systemctl restart ollama

# macOS
# Restart via the menu bar icon, or kill and reopen Ollama.app

Now pull the model. For Devstral Small 2:

ollama pull devstral-small-2

This downloads the Q4_K_M quantized version at ~15 GB. Wait for the download to complete, then verify:

ollama list
# Expected output includes:
# devstral-small-2   ...   15.5 GB   ...

For the budget tier (qwen2.5-coder:14b):

ollama pull qwen2.5-coder:14b

Quick sanity check that tool calling is responding:

ollama run devstral-small-2 "What is 2 + 2? Use a tool call to return the answer."
# A model with proper tool-call support produces a structured response.
# A model without it just says "4" in plain text — not what OpenHands needs.

Step 2: Run OpenHands via Docker

OpenHands requires Docker because every task runs inside an isolated sandbox container. The isolation prevents runaway agent processes from affecting your host filesystem.

Install Docker if you haven’t: docker.com/get-docker. On Linux, confirm your user is in the docker group:

sudo usermod -aG docker $USER
newgrp docker

The version-matching requirement

OpenHands’ main container and the sandbox runtime container must run the same version. Mismatched tags produce a cryptic container start error that has no obvious diagnostic message. Always pin both explicitly.

For OpenHands 1.7.0:

export OH_VERSION=1.7.0
export RUNTIME_TAG=1.7.0-nikolaik

The Linux networking trap

On macOS and Windows (Docker Desktop), host.docker.internal resolves to your host machine automatically. On Linux, it does not. Ollama runs on your host at port 11434; the OpenHands container needs to reach it via host.docker.internal. Without the --add-host flag, OpenHands starts, you configure Ollama as the LLM provider, and every single task attempt returns a connection error without a clear reason why.

The fix is one flag in the Docker run command: --add-host host.docker.internal:host-gateway.

The full Docker run command

docker run -it --rm \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:${RUNTIME_TAG} \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands:/.openhands \
  -v ~/workspace:/opt/workspace_base \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  docker.all-hands.dev/all-hands-ai/openhands:${OH_VERSION}

Volume mounts explained:

/var/run/docker.sock — lets OpenHands spawn sandbox containers (required)
~/.openhands:/.openhands — persists your LLM config across container restarts
~/workspace:/opt/workspace_base — the local directory OpenHands works in; set this to your actual project path

macOS note: On macOS with Docker Desktop, you can omit --add-host and replace host.docker.internal in the Ollama URL with the same token — Docker Desktop handles the resolution automatically.

OpenHands should now be accessible at http://localhost:3000.

Step 3: Configure Ollama as the LLM provider

OpenHands stores LLM configuration in ~/.openhands/config.toml (which persists via the mounted volume). You can set this through the web UI or edit the file directly.

Via the web UI (easier):

Open http://localhost:3000
Click the settings icon (gear) in the top-right corner
Navigate to LLM Provider
Set Provider to Ollama
Set Model to devstral-small-2 (or qwen2.5-coder:32b)
Set Base URL to http://host.docker.internal:11434
Set API Key to ollama (any non-empty string — Ollama doesn’t validate it, but the field can’t be blank)
Save and reload the page

Via config.toml (repeatable for scripted setups):

Edit ~/.openhands/config.toml:

[llm]
model = "ollama/devstral-small-2"
api_base = "http://host.docker.internal:11434"
api_key = "ollama"
temperature = 0.0

The ollama/ prefix is the litellm provider format OpenHands uses internally. Without the prefix, OpenHands won’t route requests to the Ollama endpoint. Temperature 0.0 is recommended for agentic coding tasks — it reduces hallucinated tool calls.

Restart the OpenHands container after editing config.toml to pick up changes.

Running your first task

Create a workspace directory and drop a project in it:

mkdir -p ~/workspace/my-project
cp -r /path/to/your/project ~/workspace/my-project/

Back in the web UI, open a new conversation. Try a well-scoped, single-file task first:

“In my-project/src/parser.py, add error handling for ValueError in the parse_config function. Write a corresponding unit test in tests/test_parser.py.”

Watch the action stream. OpenHands should:

Issue a read_file tool call to read parser.py
Issue a write_to_file call with the modified version
Read or create test_parser.py
Run the tests via a run_terminal_cmd call
Iterate if the tests fail

A task that completes cleanly with Devstral Small 2 should take 2–5 minutes. If the conversation reaches 5 messages with no file writes, the model is likely stalling on tool calls — the most common symptom of a model that doesn’t handle structured JSON output reliably.

What breaks with local models

Planning Mode reduces wasted context

Enable OpenHands’ Planning Mode in Settings. When it’s on, the agent writes a task plan and pauses for your confirmation before touching any files. For complex tasks with local models — which have less headroom for self-correction than frontier models — confirming the plan before execution meaningfully reduces the rate of tasks that go in circles. Planning Mode has been in beta since v1.6.0 and is stable enough for daily use.

Single-file tasks work reliably; cross-repo refactors fail

Devstral Small 2 at 46.8% SWE-bench means roughly half of GitHub-issue-style bugs are solved end-to-end without your intervention. The failures cluster around:

Tasks requiring changes across more than 4–5 files
Tasks that need the model to reason about implicit invariants (“this function should only be called after initialization”)
Browser tool tasks where JavaScript-heavy pages block Playwright scraping

For these, either break the task into smaller steps manually, or switch to a cloud model for the session. The BYOK path (Settings → LLM Provider, swap to your Anthropic or OpenAI key) takes under a minute.

The Docker socket permission error

If OpenHands shows a container start error immediately after launch, the most common cause is that the Docker socket isn’t accessible to the process inside the container. Verify:

ls -la /var/run/docker.sock
# Should show: srw-rw---- ... root docker ...
groups $USER
# Should include: docker

If your user isn’t in the docker group, re-run sudo usermod -aG docker $USER and open a new shell session (group membership doesn’t refresh in existing sessions).

Local vs. cloud performance summary

Scenario	Devstral Small 2 (local)	Claude Sonnet 4.5 (cloud)
Single-function bug fix	Usually completes	Usually completes
Add test coverage to a module	Completes ~80% of the time	Completes >90%
Multi-file refactor (5+ files)	Completes ~50%	Completes ~70%
GitHub issue end-to-end	~46.8% SWE-bench	~72.8% SWE-bench
API cost per task	$0 (GPU utilization only)	~$0.50–$3.00 depending on task
Latency per step	15–30s at Q4_K_M, RTX 4090	3–8s

The latency gap is noticeable on longer tasks. A 15-step task that takes 3 minutes with Claude Sonnet can take 7–10 minutes with Devstral on local hardware. If you’re watching the action stream, that’s fine. If you’re running batch tasks unattended, budget accordingly.

FAQ

Can I use qwen2.5-coder instead of Devstral? Yes. Set model = "ollama/qwen2.5-coder:32b" in config.toml. It performs similarly to Devstral Small 2 on most coding tasks, with slightly weaker performance on longer multi-file contexts. If you’re on 24 GB VRAM, either works — try both and keep the one that completes your typical tasks more reliably.

Does this work without a GPU (CPU inference)? Technically yes, but practically unusable for multi-step agentic tasks. A 14B model on CPU produces ~1–2 tokens per second. A single OpenHands task requiring 200 tool-call exchanges would take 20–40 minutes. CPU inference is useful for one-off completions; it’s not a realistic path for agentic loops.

What’s the minimum VRAM for this to be useful? 12 GB VRAM with qwen2.5-coder:14b is the realistic floor for single-file agentic tasks. Below that, you’re in CPU inference territory or running a 7B model that stalls on tool calls.

Why does OpenHands try to push to main directly? Known issue documented in the review. Add branch protection rules to your repository before connecting OpenHands to any GitHub integration. For local tasks without Git integration, this doesn’t apply.

The web UI loads but LLM calls time out. Almost always the Linux networking trap: Ollama isn’t reachable from inside the Docker container. Confirm you ran --add-host host.docker.internal:host-gateway and that Ollama is actually running on the host (curl http://localhost:11434 should return a version response). Then verify the API Base in OpenHands Settings shows http://host.docker.internal:11434, not http://localhost:11434 — localhost inside the container refers to the container, not the host.

Sources

Last updated Jun 08 2026. OpenHands, Ollama, and Devstral version numbers change frequently; verify current releases before following this guide.

Was this article helpful?