OpenHands 1.7.0 + Ollama in 2026: complete local setup, the Docker networking trap, and which models actually complete agentic tasks
TL;DR: OpenHands 1.7.0 + Ollama gives you a fully offline autonomous coding agent — no API key, no cloud, no data leaving your machine. The setup takes about 30 minutes if you know the two traps: Docker’s host.docker.internal doesn’t resolve on Linux by default, and the sandbox runtime image tag must exactly match the OpenHands image version. Miss either and the container fails silently or connects to nothing.
What you’ll be able to do after this guide:
- Run OpenHands autonomously against your local codebase with zero API cost
- Give it a GitHub issue URL and watch it write the fix, run tests, and commit the result — all on your machine
- Know which local models actually complete multi-file tasks vs. which ones stall out
| OpenHands + Ollama (local) | OpenHands Cloud Free | OpenHands Cloud Pro | |
|---|---|---|---|
| Best for | Privacy-first teams, air-gapped infra, zero ongoing cost | Evaluation, 10 tasks/day | Daily driver with managed sandbox |
| Price / Cost | $0 after hardware | $0, BYOK at-cost | $20/mo + LLM at-cost |
| Model quality | Devstral Small 2: ~46.8% SWE-bench on OpenHands | Your key, your rate | Same |
| The catch | Requires GPU with ≥15 GB VRAM; Linux needs —add-host flag | 10 conv/day hard limit | LLM bill on top |
Honest take: If you have an RTX 4090 or equivalent sitting idle, the local path makes more sense than paying Claude API rates every time OpenHands spins up a multi-step task. On smaller hardware, Cloud Free is the smarter evaluation path before committing to the Docker setup.
Why local OpenHands
The OpenHands review covers the full architecture. The short version relevant to this guide: OpenHands is an autonomous coding agent backed by whichever LLM you configure. Swap the LLM and the agent behavior changes substantially.
Running locally means:
- Zero data exposure — no code, no prompts, no environment variables reach an external server
- No per-task API bill — model inference is just GPU utilization on your own machine
- Freedom to use models that aren’t available via commercial APIs (Apache 2.0-licensed Devstral Small 2 for example)
The trade-off is raw task completion capability. A frontier cloud model scores roughly 72% on SWE-bench when used as the OpenHands backend. The best local option as of June 2026, Devstral Small 2 (24B), scores approximately 46.8% on the same benchmark when driven by OpenHands. That 25-point gap is real — local inference misses roughly 70 more bug fixes per 277 test cases. For most real-world tasks (single-file fixes, feature additions, test generation), the local path still delivers. For the hardest 10% of issues — cross-cutting refactors, subtle invariant bugs, multi-service changes — the gap shows.
Hardware floor
Model size determines what you can run. OpenHands needs a model with solid tool-calling support — it issues file read/write calls, terminal commands, and browser interactions as structured tool invocations. Models without reliable JSON tool calling stall immediately.
| Model | Ollama tag | VRAM (Q4_K_M) | Tool calling | Task quality |
|---|---|---|---|---|
| Devstral Small 2 | devstral-small-2 | ~15 GB | Native, MoE-trained | Best local option; ~46.8% SWE-bench via OpenHands |
| qwen2.5-coder:32b | qwen2.5-coder:32b | ~20 GB | Native | Strong; slightly behind Devstral on agentic multi-step |
| qwen2.5-coder:14b | qwen2.5-coder:14b | ~9 GB | Native | Budget floor; completes single-file tasks reliably |
| Older llama3 / codellama | various | varies | None or unreliable | Not recommended — stalls on first tool call |
Devstral Small 2 runs at Q4_K_M on 15 GB of VRAM, fitting inside any 24 GB card (RTX 4090, RTX 3090). For context: its 256k token window means you can feed an entire Python package without truncating. If your card tops out at 12–16 GB, use qwen2.5-coder:14b as your starting point — it completes simple one-file tasks and is a realistic daily evaluation model before upgrading hardware.
For a full breakdown of which models fit which GPU tier, the runaihome.com guide at Best Local AI Models by VRAM is worth checking before buying hardware for this setup.
Step 1: Install Ollama and pull a coding model
Current version: Ollama 0.30.6.
Linux:
curl -fsSL https://ollama.com/install.sh | sh
macOS / Windows: Download from ollama.com/download.
Before pulling a model, set the context window. OpenHands agentic loops consume 20k–50k tokens per task. Ollama’s default 2k context silently truncates input — same trap as in Aider + Ollama setups and Continue.dev. Fix it before you start:
# Add to ~/.bashrc or ~/.zshrc
export OLLAMA_NUM_CTX=65536
Restart the Ollama service after setting the variable:
# Linux
sudo systemctl restart ollama
# macOS
# Restart via the menu bar icon, or kill and reopen Ollama.app
Now pull the model. For Devstral Small 2:
ollama pull devstral-small-2
This downloads the Q4_K_M quantized version at ~15 GB. Wait for the download to complete, then verify:
ollama list
# Expected output includes:
# devstral-small-2 ... 15.5 GB ...
For the budget tier (qwen2.5-coder:14b):
ollama pull qwen2.5-coder:14b
Quick sanity check that tool calling is responding:
ollama run devstral-small-2 "What is 2 + 2? Use a tool call to return the answer."
# A model with proper tool-call support produces a structured response.
# A model without it just says "4" in plain text — not what OpenHands needs.
Step 2: Run OpenHands via Docker
OpenHands requires Docker because every task runs inside an isolated sandbox container. The isolation prevents runaway agent processes from affecting your host filesystem.
Install Docker if you haven’t: docker.com/get-docker. On Linux, confirm your user is in the docker group:
sudo usermod -aG docker $USER
newgrp docker
The version-matching requirement
OpenHands’ main container and the sandbox runtime container must run the same version. Mismatched tags produce a cryptic container start error that has no obvious diagnostic message. Always pin both explicitly.
For OpenHands 1.7.0:
export OH_VERSION=1.7.0
export RUNTIME_TAG=1.7.0-nikolaik
The Linux networking trap
On macOS and Windows (Docker Desktop), host.docker.internal resolves to your host machine automatically. On Linux, it does not. Ollama runs on your host at port 11434; the OpenHands container needs to reach it via host.docker.internal. Without the --add-host flag, OpenHands starts, you configure Ollama as the LLM provider, and every single task attempt returns a connection error without a clear reason why.
The fix is one flag in the Docker run command: --add-host host.docker.internal:host-gateway.
The full Docker run command
docker run -it --rm \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:${RUNTIME_TAG} \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-v ~/workspace:/opt/workspace_base \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
docker.all-hands.dev/all-hands-ai/openhands:${OH_VERSION}
Volume mounts explained:
/var/run/docker.sock— lets OpenHands spawn sandbox containers (required)~/.openhands:/.openhands— persists your LLM config across container restarts~/workspace:/opt/workspace_base— the local directory OpenHands works in; set this to your actual project path
macOS note: On macOS with Docker Desktop, you can omit --add-host and replace host.docker.internal in the Ollama URL with the same token — Docker Desktop handles the resolution automatically.
OpenHands should now be accessible at http://localhost:3000.
Step 3: Configure Ollama as the LLM provider
OpenHands stores LLM configuration in ~/.openhands/config.toml (which persists via the mounted volume). You can set this through the web UI or edit the file directly.
Via the web UI (easier):
- Open
http://localhost:3000 - Click the settings icon (gear) in the top-right corner
- Navigate to LLM Provider
- Set Provider to
Ollama - Set Model to
devstral-small-2(orqwen2.5-coder:32b) - Set Base URL to
http://host.docker.internal:11434 - Set API Key to
ollama(any non-empty string — Ollama doesn’t validate it, but the field can’t be blank) - Save and reload the page
Via config.toml (repeatable for scripted setups):
Edit ~/.openhands/config.toml:
[llm]
model = "ollama/devstral-small-2"
api_base = "http://host.docker.internal:11434"
api_key = "ollama"
temperature = 0.0
The ollama/ prefix is the litellm provider format OpenHands uses internally. Without the prefix, OpenHands won’t route requests to the Ollama endpoint. Temperature 0.0 is recommended for agentic coding tasks — it reduces hallucinated tool calls.
Restart the OpenHands container after editing config.toml to pick up changes.
Running your first task
Create a workspace directory and drop a project in it:
mkdir -p ~/workspace/my-project
cp -r /path/to/your/project ~/workspace/my-project/
Back in the web UI, open a new conversation. Try a well-scoped, single-file task first:
“In
my-project/src/parser.py, add error handling forValueErrorin theparse_configfunction. Write a corresponding unit test intests/test_parser.py.”
Watch the action stream. OpenHands should:
- Issue a
read_filetool call to readparser.py - Issue a
write_to_filecall with the modified version - Read or create
test_parser.py - Run the tests via a
run_terminal_cmdcall - Iterate if the tests fail
A task that completes cleanly with Devstral Small 2 should take 2–5 minutes. If the conversation reaches 5 messages with no file writes, the model is likely stalling on tool calls — the most common symptom of a model that doesn’t handle structured JSON output reliably.
What breaks with local models
Planning Mode reduces wasted context
Enable OpenHands’ Planning Mode in Settings. When it’s on, the agent writes a task plan and pauses for your confirmation before touching any files. For complex tasks with local models — which have less headroom for self-correction than frontier models — confirming the plan before execution meaningfully reduces the rate of tasks that go in circles. Planning Mode has been in beta since v1.6.0 and is stable enough for daily use.
Single-file tasks work reliably; cross-repo refactors fail
Devstral Small 2 at 46.8% SWE-bench means roughly half of GitHub-issue-style bugs are solved end-to-end without your intervention. The failures cluster around:
- Tasks requiring changes across more than 4–5 files
- Tasks that need the model to reason about implicit invariants (“this function should only be called after initialization”)
- Browser tool tasks where JavaScript-heavy pages block Playwright scraping
For these, either break the task into smaller steps manually, or switch to a cloud model for the session. The BYOK path (Settings → LLM Provider, swap to your Anthropic or OpenAI key) takes under a minute.
The Docker socket permission error
If OpenHands shows a container start error immediately after launch, the most common cause is that the Docker socket isn’t accessible to the process inside the container. Verify:
ls -la /var/run/docker.sock
# Should show: srw-rw---- ... root docker ...
groups $USER
# Should include: docker
If your user isn’t in the docker group, re-run sudo usermod -aG docker $USER and open a new shell session (group membership doesn’t refresh in existing sessions).
Local vs. cloud performance summary
| Scenario | Devstral Small 2 (local) | Claude Sonnet 4.5 (cloud) |
|---|---|---|
| Single-function bug fix | Usually completes | Usually completes |
| Add test coverage to a module | Completes ~80% of the time | Completes >90% |
| Multi-file refactor (5+ files) | Completes ~50% | Completes ~70% |
| GitHub issue end-to-end | ~46.8% SWE-bench | ~72.8% SWE-bench |
| API cost per task | $0 (GPU utilization only) | ~$0.50–$3.00 depending on task |
| Latency per step | 15–30s at Q4_K_M, RTX 4090 | 3–8s |
The latency gap is noticeable on longer tasks. A 15-step task that takes 3 minutes with Claude Sonnet can take 7–10 minutes with Devstral on local hardware. If you’re watching the action stream, that’s fine. If you’re running batch tasks unattended, budget accordingly.
FAQ
Can I use qwen2.5-coder instead of Devstral?
Yes. Set model = "ollama/qwen2.5-coder:32b" in config.toml. It performs similarly to Devstral Small 2 on most coding tasks, with slightly weaker performance on longer multi-file contexts. If you’re on 24 GB VRAM, either works — try both and keep the one that completes your typical tasks more reliably.
Does this work without a GPU (CPU inference)? Technically yes, but practically unusable for multi-step agentic tasks. A 14B model on CPU produces ~1–2 tokens per second. A single OpenHands task requiring 200 tool-call exchanges would take 20–40 minutes. CPU inference is useful for one-off completions; it’s not a realistic path for agentic loops.
What’s the minimum VRAM for this to be useful? 12 GB VRAM with qwen2.5-coder:14b is the realistic floor for single-file agentic tasks. Below that, you’re in CPU inference territory or running a 7B model that stalls on tool calls.
Why does OpenHands try to push to main directly?
Known issue documented in the review. Add branch protection rules to your repository before connecting OpenHands to any GitHub integration. For local tasks without Git integration, this doesn’t apply.
The web UI loads but LLM calls time out.
Almost always the Linux networking trap: Ollama isn’t reachable from inside the Docker container. Confirm you ran --add-host host.docker.internal:host-gateway and that Ollama is actually running on the host (curl http://localhost:11434 should return a version response). Then verify the API Base in OpenHands Settings shows http://host.docker.internal:11434, not http://localhost:11434 — localhost inside the container refers to the container, not the host.
Sources
- OpenHands GitHub repository — v1.7.0, MIT license
- OpenHands v1.7.0 release notes (May 1, 2026) — GitHub
- OpenHands Docker Compose configuration — GitHub
- litellm v1.88.0 — Ollama provider support
- Devstral Small 2: 24B agentic coding model — Mistral AI
- OpenHands + SWE-bench: inference-time scaling results — OpenHands blog
- Ollama 0.30.6 release — Ollama GitHub
- Cline + Ollama tool-call bug (June 2026) — aicoderscope.com
- Local LLM model selection by VRAM — runaihome.com
Last updated Jun 08 2026. OpenHands, Ollama, and Devstral version numbers change frequently; verify current releases before following this guide.
Was this article helpful?
Thanks for the feedback — it helps improve future articles.