Jun 7, 2026

Cline + Ollama stuck in a loop in 2026: the qwen2.5-coder JSON tool-call bug and the 90-second workaround

By AICoderScope Team · 11 min read

clineollamalocal-llmfixqwentool-usedebugsetup-guide

TL;DR: Cline + Ollama breaks in two distinct ways — models without native tool-calling get HTTP 400/500 errors immediately, and models with native tool-calling (qwen2.5-coder) get trapped in a JSON infinite loop because Cline’s parser expects Anthropic-style XML but the model outputs JSON. Both bugs are unresolved in Cline v3.88.1 as of June 7, 2026. The JSON-loop fix takes 90 seconds.

What you’ll be able to do after this guide:

Identify which of the two failure modes is breaking your Cline + Ollama setup
Apply the .clinerules XML injection that fixes the qwen2.5-coder infinite loop
Pick a model configuration that avoids the issue for your hardware tier

Honest take: If you’re already on qwen2.5-coder, the .clinerules fix is the fastest path today. If you’re on a non-tool-calling model hitting HTTP 400 errors, switch to qwen2.5-coder:14b as your floor and apply the same fix — it’s the only working combination until PRs #11272 and #11301 land upstream.

What you actually see when it breaks

You followed the Cline + Ollama setup guide, pulled qwen2.5-coder:32b, set the context window, wired it to Cline v3.88.1. You give Cline a task: “Add a config loader that reads .env and returns typed settings.” The spinner appears. Nothing happens.

After 30–60 seconds, one of two things occurs:

Scenario A — The request fails immediately:

Error: Request failed with status code 400

Or status 500. Cline shows the error, pauses, and gives up. No tool calls executed.

Scenario B — Cline shows the model thinking for a long time. The context usage counter climbs. Then you see this pattern accumulate in the output pane:

{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}
{"name": "write_to_file", "arguments": {"path": "src/config.ts", "content": "..."}}

A MODEL_NO_TOOLS_USED error eventually kills the loop. No files were created. Nothing got done.

These are not the same bug. They have different causes and different fixes.

Two failure modes with one shared root

Cline’s Ollama integration was updated to use the structured tools API parameter — the standard mechanism for OpenAI-compatible APIs to pass tool definitions to models with native function-calling support. The change broke compatibility in two directions simultaneously.

Mode A: HTTP 400/500 — models without native tool-calling

Standard GGUF models loaded in Ollama — older llama3 variants, codellama, most general-purpose models — don’t advertise a tool-calling capability. When Cline sends a request with a populated tools parameter to these models, Ollama rejects the request at the HTTP level before the model sees the prompt. You get 400 or 500. The task never starts.

Cline currently sends the tools parameter unconditionally to all Ollama models regardless of capability. PR #11301 and PR #11319 fix this by querying /api/show for capability flags before deciding whether to include tools. Neither is merged as of June 7, 2026.

Mode B: Infinite JSON loop — models with native tool-calling

This is the qwen2.5-coder failure, documented in GitHub issue #10843. Qwen2.5-Coder models do support native function-calling — they were trained to output standard JSON tool-invocation payloads:

{"name": "write_to_file", "arguments": {"path": "src/app.ts", "content": "import dotenv..."}}

Cline’s streaming parser was built around Anthropic’s XML tool tag format:

<write_to_file>
<path>src/app.ts</path>
<content>import dotenv...</content>
</write_to_file>

When the model outputs JSON, Cline’s parser treats it as plain conversational text — not a tool invocation. The agent loop fires MODEL_NO_TOOLS_USED, informs the model that no tool was called, and asks it to try again. The model retries with the same JSON. The loop continues until context fills or the session times out.

PR #11272 implements a JSON fallback parser that intercepts these JSON payloads and converts them to the same internal structures the XML path produces. Code review is passing, but the PR is not merged yet.

Which models hit which mode

Check a model’s capability before you start:

curl http://localhost:11434/api/show \
  -d '{"name": "qwen2.5-coder:32b"}' | python3 -m json.tool | grep -A 10 capabilities

If the output includes "tools" in the capabilities list, the model has native tool-calling — expect Mode B without the fix below. If there’s no capabilities field or it lists nothing, expect Mode A.

Model (Ollama tag)	Tool support	Failure mode
`qwen2.5-coder:7b`	Native JSON	Mode B — JSON loop
`qwen2.5-coder:14b`	Native JSON	Mode B — JSON loop
`qwen2.5-coder:32b`	Native JSON	Mode B — documented in GitHub #10843
`codellama:34b`	None	Mode A — HTTP 400/500
`llama3:8b` / `llama3.1:8b`	Varies by quantization	Mode A on most GGUF builds
`deepseek-coder:6.7b`	None	Mode A — HTTP 400/500

Fix for Mode B: `.clinerules` XML injection

This workaround is confirmed in GitHub issue #10843. Forcing the model to output Anthropic-style XML makes Cline’s existing parser execute tool calls correctly. One file, 90 seconds.

In your project root, create the .clinerules directory if it doesn’t exist:

mkdir -p .clinerules

Create .clinerules/tool-format.md:

# Tool invocation format

CRITICAL: Never output tool calls as JSON objects. Do not output patterns like:
{"name": "write_to_file", "arguments": {...}}
{"name": "read_file", "arguments": {...}}
{"name": "execute_command", "arguments": {...}}

You MUST use only Anthropic-style XML tags for all tool invocations:

<write_to_file>
<path>path/to/file</path>
<content>
file content here
</content>
</write_to_file>

<read_file>
<path>path/to/file</path>
</read_file>

<execute_command>
<command>ls -la</command>
</execute_command>

JSON output is silently ignored. XML tags are the only format that executes.

Save the file, then reload VS Code:

Ctrl+Shift+P → Developer: Reload Window

Cline loads .clinerules content at startup. Open a new Cline conversation — existing ones don’t pick up rule changes mid-session.

What success looks like: The model’s output shifts from a wall of repeating JSON blocks to a structured Cline tool-call card showing a file path, a content preview, and an Approve/Reject button. That button means the XML parse succeeded and Cline is waiting on your confirmation before writing. If the JSON accumulation continues with no button appearing, the .clinerules content hasn’t loaded — check the directory location and reload again.

Fix for Mode A: switch model, then apply Fix 1

There’s no clean user-side workaround for Mode A today. The HTTP 400/500 happens at the API layer before the model sees anything. Until PR #11301 or #11319 merges, models without native tool-calling cannot be used with Cline’s Ollama provider as currently shipped.

The practical path: switch to qwen2.5-coder and apply the .clinerules fix above. Hardware requirements:

Model	VRAM needed	Recommended GPU
`qwen2.5-coder:7b`	~5 GB	RTX 4060 8 GB (demo tier only — 7B loses multi-file agent tasks)
`qwen2.5-coder:14b`	~9 GB	RTX 3060 12 GB or RTX 4060 Ti 16 GB — minimum for real agentic work
`qwen2.5-coder:32b`	~20 GB	RTX 4090 or RTX 3090 — best practical local tier

Pull the model:

ollama pull qwen2.5-coder:14b   # 9 GB download, runs on 12–16 GB VRAM
# or
ollama pull qwen2.5-coder:32b   # 20 GB download, RTX 3090/4090 tier

Set a usable context window (the default 2,048 tokens is fatal for Cline):

OLLAMA_CONTEXT_LENGTH=32768 ollama serve

For persistence via systemd, add to /etc/systemd/system/ollama.service.d/override.conf:

[Service]
Environment="OLLAMA_CONTEXT_LENGTH=32768"

Then reload:

sudo systemctl daemon-reload && sudo systemctl restart ollama

The Cline + Ollama setup guide walks through the full context window configuration and initial Cline settings. Once those are in place, add the .clinerules/tool-format.md file from Fix 1 above.

For a broader guide to GPU selection for local LLM coding work, the runaihome.com local AI models by VRAM tier guide covers the full landscape.

Why the fix might not stick

A few common reasons the .clinerules workaround doesn’t take on the first try:

Wrong directory. .clinerules/ must sit at the VS Code workspace root — the same folder you opened with File → Open Folder. If you opened a subdirectory as your workspace, the .clinerules/ folder needs to go there, not at the repo root.

Stale conversation. .clinerules content is loaded at conversation start. If you created the file mid-conversation, the running session won’t pick it up. Open a fresh Cline conversation after the reload.

Context window too small. If OLLAMA_CONTEXT_LENGTH is still at the 2,048-token default, the .clinerules content gets pushed out of the model’s context window on turn 2 or 3 of a multi-file task. The loop appears “fixed” on turn 1 and reappears on turn 4. Set minimum 16,384; 32,768 is better.

File encoding. Create .clinerules/tool-format.md from VS Code’s built-in editor or a Unix terminal. Files created with Windows Notepad may include a BOM or CRLF endings that cause parse issues.

Enable compact prompts — required for local models

Before doing anything else with a local model in Cline: Settings → Advanced → Compact Prompts. This trims Cline’s system prompt overhead by roughly 90%. Without it, the system prompt plus .clinerules content plus the task description can consume most of a 32k context window before your actual code loads.

Compact mode removes some of Cline’s longer edge-case tool behaviors. For local model use, the tradeoff is not close — enable it.

Tracking the upstream fix

Three PRs are in flight as of this writing (June 7, 2026):

PR #11272 — JSON fallback parser (Mode B fix). Scans model output for JSON tool payloads when the XML parser finds nothing, converts them to Cline’s internal tool structures. Passing code review. Not merged.

PR #11301 — Capability-check before sending tools (Mode A fix). Queries /api/show before including the tools parameter in Ollama requests. Passing unit tests. Not merged.

PR #11319 — Per-model tool capability gating (also Mode A). Stores capability data in provider model metadata; gates AI SDK tool injection to only tool-capable models. Awaiting reviewer approval.

When either #11272 or the #11301/#11319 pair lands, a Cline update will propagate automatically through VS Code’s extension auto-update. Check your installed version against the Cline releases page. Once you’re on a build that includes the JSON fallback parser, you can delete .clinerules/tool-format.md — qwen2.5-coder will handle tool invocations correctly without the XML override.

FAQ

Does this affect Cline with LM Studio instead of Ollama? Different issue. LM Studio’s server tends to return HTTP errors or silently strip the tools field rather than enter a JSON loop. The main LM Studio trap is a 32.8k context ceiling that Cline’s LM Studio integration imposes regardless of your model’s actual context size — covered in the Cline + LM Studio guide.

Does Aider or Continue.dev have this problem with Ollama? No. Aider uses its own prompt protocol for tool invocations; it doesn’t go through Cline’s XML parsing system. Continue.dev has different integration points with Ollama. This bug is specific to how Cline’s Ollama provider passes tool definitions.

Why did this start working and then break? Cline updated its Ollama provider to use the structured API tools parameter — matching how OpenAI’s API passes tool definitions — to unlock better model-native tool handling. The update didn’t account for the two edge cases: models that reject tools entirely (Mode A) and models that emit JSON when Cline expects XML (Mode B). The fix is coming; the PRs are close.

Can I pin to an older Cline version while waiting? Yes. In VS Code, right-click the Cline extension → “Install Specific Version.” Versions before approximately v3.80 used a different tool-call mechanism that predates this regression. You’ll lose recent model additions and the debug settings panel added in v3.88.1, but the agent loop will work with Ollama models. This is a reasonable temporary option if you need uninterrupted local coding today.

Will the fix work with DeepSeek-R1 or other reasoning models via Ollama? Reasoning models introduce a different variable: they output extended <think> blocks before tool calls, which can conflict with Cline’s streaming parser independent of the JSON/XML issue. For reasoning models on Ollama, see the Aider + LM Studio guide for context on the output-token ceiling problem that reasoning models compound.

Sources

Last updated June 7, 2026. Cline releases frequently; check the linked PRs for merge status before following the workarounds — they may already be fixed in your installed version.

Recommended Gear

NVIDIA RTX 4060 8 GB — entry tier for 7B local models
NVIDIA RTX 3060 12 GB — minimum for 14B agentic use
NVIDIA RTX 4060 Ti 16 GB — solid 14B daily-driver tier
NVIDIA RTX 4090 24 GB — best practical local tier for 32B models
NVIDIA RTX 3090 24 GB — value alternative for the 32B tier

Was this article helpful?