OpenHands Review 2026: Open-Source AI Coding Agent, 72% SWE-Bench, and the Self-Hosting Catch
OpenHands has 74,700 GitHub stars, beats Devin 2.0 on SWE-bench by nearly 27 points, and the cloud tier starts at zero. Those three facts should have every developer evaluating Devin at least trying it. The reason most don’t — and why this review matters — is that OpenHands’ strengths and failure modes are both more extreme than a 30-second summary captures.
This is a post-v1.7.0 (May 1, 2026) look at whether OpenHands earns a permanent slot in a professional workflow.
What OpenHands actually is
OpenHands (formerly OpenDevin) is an autonomous software engineering agent from All Hands AI, a startup that raised an $18.8M Series A. Give it a task in plain English — fix this GitHub issue, implement this feature — and it spins up a sandboxed environment, writes code, runs tests, and iterates until the task resolves or the context runs out. You watch a live action stream rather than write the code yourself.
The project ships in four configurations:
- Open Source (self-hosted): MIT-licensed core stack. Install via Docker, point at any LLM via API key, run the web GUI at
localhost:3000. No seat fees. - Cloud Free (BYOK tier): Hosted on OpenHands infrastructure. Bring your own Anthropic, OpenAI, or Mistral key. Hard cap at 10 conversations per day. LLM billed at-cost through your own key — no markup added by OpenHands.
- Cloud Pro ($20/month): Covers runtime compute so sandbox VMs don’t appear as a separate line item. Unlocks GitHub, GitLab, Bitbucket, and Slack integrations. Includes $20 in one-time cloud credits. LLM still billed at-cost.
- Enterprise (custom pricing): Self-hosted Kubernetes deployment with PostgreSQL-backed multi-tenancy, RBAC, and the Agent Control Plane — a centralized system for managing agent fleets at scale, launched May 6, 2026.
One architectural note before going further: the V0→V1 SDK split happened in November 2025. A substantial portion of community tutorials and third-party guides describe the V0 architecture, which is a different codebase. If a setup article doesn’t reference V1 or the Software Agent SDK, treat it as outdated.
Pricing in practice
The cost model is genuinely different from every other tool in this space:
| Plan | Monthly fee | LLM cost | Runtime |
|---|---|---|---|
| Self-hosted (OSS) | $0 | Your API key, market rates | Your server |
| Cloud Free | $0 | Your API key, market rates | Covered (10 conv/day limit) |
| Cloud Pro | $20 | At-cost via OpenHands LLM provider | Covered |
| Enterprise | Custom | Custom | Included |
When you use the OpenHands LLM provider on Cloud Pro, you pay Anthropic/OpenAI rates directly with zero markup. Claude Sonnet 4.5 runs $3 per 1M input and $15 per 1M output tokens — the same rate as calling the API yourself.
Compare that to Devin 2.0: also $20/month Pro, but ACU overages (the compute units Devin burns per task) can add $100–$400/month for active users. OpenHands’ cost structure is substantially more predictable for teams running high task volume.
If you want to run the self-hosted version without idling a local workstation, cloud GPU instances (e.g., RunPod) are a reasonable option for on-demand compute. For hardware sizing context, the runaihome.com guide to local LLM servers covers memory requirements by model size.
What it gets right
GitHub issue → pull request
This is the workflow OpenHands was purpose-built for, and the one where it performs most reliably. Point it at a GitHub issue URL: it reads the issue, navigates the repository, identifies files to change, creates a branch, writes the fix, runs tests, and opens a PR. The loop is coherent end-to-end in a way that simpler agents aren’t.
For maintainers carrying a backlog of labeled good-first-issue bugs — the kind that require a real fix but don’t need senior judgment — this is not a toy. The code review step remains mandatory; OpenHands closes issues but doesn’t guarantee quality.
Planning Mode (beta since v1.6.0)
Before March 30, 2026, OpenHands jumped straight into execution on every task. Sometimes that produced a correct solution in two minutes. Sometimes it dug into a hole and burned tokens going in circles on a wrong assumption.
Planning Mode changes the loop: the agent writes an implementation plan and pauses for your approval before touching the codebase. For anything beyond a trivial single-file fix, this one feature meaningfully improves completion rates. It’s still labeled beta as of v1.7.0, but stable enough to use daily.
Model flexibility
This is OpenHands’ structural advantage over every commercial autonomous agent. Claude 4.5 Sonnet, GPT-5, Gemini 3.1, Devstral 24B, Qwen3-235B — you swap models by changing a config field. When Anthropic ships a better model, you configure it once. Devin and Kiro use locked or constrained model stacks; you’re dependent on the vendor’s release cycle.
SWE-bench scores vary significantly by model choice. Devstral 24B scores approximately 46.8% on SWE-bench Verified when used as the OpenHands backend. Smaller open-weight models perform lower. Claude Sonnet 4.5 reaches ~72% on the V1 SDK harness. The model you select determines what “OpenHands” actually delivers.
Software Agent SDK
For teams building on top of agentic infrastructure, OpenHands ships a composable Python SDK under Apache 2.0 (separate repository). You can define custom agent workflows, delegate specialized tasks to sub-agents via TaskToolSet, and integrate OpenHands programmatically into CI/CD pipelines without running the GUI. This is the layer most enterprise self-hosted deployments are built on.
The Agent Control Plane (Enterprise, launched May 6, 2026) extends this into fleet management: least-privilege access controls scoped at the workflow level, spend tracking per workflow for cost attribution, and full action logging for debugging and compliance. It directly addresses the concern that autonomous agents running in production don’t have granular permission boundaries.
Where it breaks
The Docker dependency
Every self-hosted OpenHands instance requires Docker. The agent spawns sandbox containers for each task — this isolates filesystem access and prevents runaway processes from affecting the host. On a developer workstation, manageable. In a CI/CD pipeline, the required socket mount (/var/run/docker.sock) creates port-mapping, permission, and resource allocation issues that take real engineering time to resolve.
The v1.7.0 release added a SANDBOX_KVM_ENABLED environment variable to pass KVM acceleration through to sandbox containers, which improves performance on supported hardware, but doesn’t remove the Docker requirement.
Critical version mismatch: The SANDBOX_RUNTIME_CONTAINER_IMAGE tag must exactly match the openhands image version. Running openhands:1.7.0 against runtime:1.6.0-nikolaik fails immediately with a cryptic container start error. Always pull matching tags.
Git credential handling
Multiple independent users have documented the same failure patterns: OpenHands sometimes attempts to push to the default branch directly, handles credentials incorrectly, and can’t reliably retrieve PR comments or status checks via CLI tools. Automated PR workflows built on OpenHands need additional safeguards — branch protection rules that prevent direct pushes to main, and explicit credential injection into the container environment.
The platform also lacks native secrets management. Passing an API key or database credential to an agent task requires environment variable injection, with no built-in secret store or masking.
Browser tool reliability
When OpenHands tasks require checking a live URL — reading API docs, verifying deployment state, scraping a configuration value — the browsing tool is the flakiest part of the stack. JavaScript-heavy pages, bot detection, and site structure changes cause silent failures that derail otherwise correct task flows. The documented workaround: reach for curl or library-level HTTP calls whenever the agent needs to interact with a web resource. Don’t build production workflows that depend on browser tool success.
The 10-conversation free ceiling
The Cloud Free tier provides genuine evaluation access, but 10 conversations per day is a hard ceiling that real development work exceeds quickly. Solo developers running focused testing may stay under it. Any team scenario hits the limit on day one. The upgrade to Cloud Pro ($20/month) is straightforward, but the free tier isn’t a sustainable daily workflow for professionals.
SWE-bench in context
OpenHands publishes two commonly cited scores:
- 77.6% on SWE-bench Verified — V0 harness, using inference-time scaling with Claude 3.5 Sonnet Thinking. This is the badge visible on the GitHub repository.
- ~72.8% on SWE-bench Verified — V1 SDK harness, with Claude Sonnet 4.5. Newer infrastructure, lower score, more representative of typical production behavior.
Both numbers are legitimate for what they measure. Neither is directly comparable to Claude Code’s 87.6% or Devin’s 45.8% without noting that all three use different evaluation harnesses and setups.
An important benchmark context: OpenAI’s independent audit found that some frontier models have likely encountered SWE-bench Verified tasks during pretraining. Scores in the high 70s–80s should be read as “capable on representative real-world issues,” not “will resolve this fraction of your actual bug backlog.” The SWE-bench Pro leaderboard — which uses a harder, less-contaminated task set — shows frontier models scoring 30–50 percentage points lower than their Verified numbers.
What the comparison to Devin is worth noting: a 72–77% vs 45.8% gap is large enough to be real even accounting for harness differences. OpenHands with a frontier model resolves real GitHub issues at a substantially higher rate than Devin 2.0.
Comparison table
| Tool | SWE-bench Verified | Price | Model lock | Self-host | Best for |
|---|---|---|---|---|---|
| OpenHands | ~72–77% (V1/V0) | $0–$20/mo + LLM | None | Yes (Docker) | GitHub issue automation, privacy-first BYOK workflows |
| Devin 2.0 | 45.8% | $20/mo + ACU overages | Yes | No | Defined async tasks, enterprise Cognizant workflows |
| Claude Code | 87.6% | $20–$200/mo | Anthropic only | Terminal | Best agentic quality, Anthropic-native |
| Cline | Not benchmarked | $0 + API cost | None | Via VS Code | IDE-integrated Plan/Act, real-time coding |
| Aider | Not benchmarked | $0 + API cost | None | Terminal | Lightweight, version-controlled, minimal setup |
Honest take
Use OpenHands if you’re automating GitHub issue resolution at scale and need a cost-predictable, model-agnostic tool that you can fully self-host. The Pro tier at $20/month plus at-cost LLM is cheaper than Devin for high-volume task queues, and the 72–77% SWE-bench score is a real advantage.
Use the self-hosted path if your team handles pre-launch IP, NDA-bound code, or regulated-industry data that can’t leave your network. Pair it with a local Ollama model (Qwen3-235B or Devstral 24B) and you get an autonomous coding agent with zero external data exposure — no other tool in this roundup matches that combination.
Don’t use OpenHands as your daily IDE companion. It has no real-time autocomplete, no editor integration, and the conversation-based interaction is slower than Cursor or Cline for iterative sessions. OpenHands handles batched, well-defined tasks; it’s not a replacement for an IDE-integrated assistant.
Budget setup time. The Docker socket permissions, version pinning, and Git credential injection each have documented failure modes. Expect at least a day of configuration before the self-hosted setup runs reliably in production. The Cloud Pro tier eliminates most of this at the cost of $20/month.
OpenHands is the best open-source autonomous coding agent available as of May 2026. The model flexibility and benchmark performance are genuine advantages over Devin. The trade is infrastructure friction and rougher real-world Git behavior. Neither is fundamental — both are the expected cost of running open-source infrastructure at the frontier.
1V1 STARTER KIT · CURSOR
Skip the week of trial-and-error setting up Cursor.
12 production-tested .cursorrules templates, 3 workflow configs, the cost-control checklist. Everything I wish I had on day one.
Get it for $19 (early bird) →Sources
- OpenHands GitHub repository — 74.7k stars, v1.7.0, MIT license
- OpenHands v1.7.0 release notes (May 1, 2026) — GitHub
- OpenHands Cloud pricing — free BYOK, Pro $20/mo, Enterprise
- OpenHands Pro Subscription — BYOK, runtime compute, integrations
- OpenHands Launches Agent Control Plane (May 6, 2026) — Yahoo Finance
- SOTA on SWE-bench Verified with inference-time scaling — OpenHands blog
- Real-world OpenHands experience: Git issues and Docker friction — Medium
- SWE-bench Pro: why Verified scores are contaminated — MorphLLM
- Best AI agents for software development ranked (May 15, 2026) — MarkTechPost
- Bring your own LLM to OpenHands Cloud (Nov 2025) — OpenHands blog
- Claude Code vs OpenHands comparison — lowcode.agency
Last updated May 24, 2026. Pricing and features change frequently; verify current state before purchasing.
Was this article helpful?
Thanks for the feedback — it helps improve future articles.