May 11, 2026

Aider + Ollama Local LLM Setup Guide 2026: Official Config, Model Selection, Context Fix

By AICoderScope Team · 10 min read

aiderollamalocal-llmsetup-guideself-hostedpair-programmingqwendeepseek-coder

Aider plus Ollama is the cleanest path to fully local, zero-API-cost AI pair programming in 2026. It’s also a setup where 80% of the public tutorials silently produce broken output, because of a default in Ollama that truncates your context without telling you. This guide gets you a working install, the right model for your hardware, and explicitly walks through the failure modes that most other tutorials skip.

Aider is a command-line AI pair programmer that edits files, runs git commits automatically, and works against any LLM endpoint—including a local Ollama server. The combination gives you a fully offline coding assistant with no API bills and no data leaving your machine. If you have a GPU with 8 GB of VRAM or more, you have enough to start.

What You’ll Have at the End

A working setup that lets you:

cd ~/projects/my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

…and then have a conversation with a local LLM that reads your code, suggests changes, edits files directly, and commits them to git—all without an internet connection. Total install time: about 20 minutes if you already have an NVIDIA GPU set up. Total cost after install: $0.

Step 0: Hardware Reality Check

The model size you can run is bounded by your VRAM. Approximate fits for the recommended coding models:

Model	VRAM needed (Q4)	Coding quality	Practical tier
qwen2.5-coder:7b	~5 GB	Decent for completion, weak on refactors	RTX 3060 12GB, RTX 4060 8GB
qwen2.5-coder:14b	~9 GB	Good — solid for everyday Aider work	RTX 4060 Ti 16GB, RTX 3090
qwen2.5-coder:32b	~20 GB	Best local option	RTX 4090, RTX 5090, dual 3090s
deepseek-coder-v2:16b (MoE)	~10 GB	Excellent for completion, weaker on agent tasks	RTX 4060 Ti 16GB+

If you’re running on an Apple Silicon Mac, unified memory takes the place of VRAM—a 32 GB Mac Studio fits the 14B comfortably, a 64 GB+ machine fits the 32B. For a deeper breakdown of which model actually fits where, our sister site has a Best Local AI Models by VRAM tier guide that covers this in detail.

The honest baseline: don’t bother with a 7B coder model on Aider for anything past one-line edits. It’s the equivalent of pairing with someone who has read your code once and forgotten most of it. The 14B is the practical floor for getting work done; the 32B is what makes Aider feel close to a paid cloud service.

Step 1: Install Ollama

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

On macOS: download the installer from ollama.com/download and run it.

On Windows: same, download the installer.

Verify the install:

ollama --version

You should see a version string. Ollama installs itself as a service that runs on http://127.0.0.1:11434 by default.

Pull the model

Pick the largest coder model your VRAM tier supports from the table above:

ollama pull qwen2.5-coder:14b

This downloads ~9 GB. Wait for it to finish. Verify:

ollama list

You should see the model listed.

Step 2: Fix the Context Window Trap

This is the step that breaks 80% of public Aider+Ollama tutorials, including some that have been quoted in posts on Hacker News. Ollama defaults to a 2,048-token context window and silently discards anything beyond that.

For Aider, which loads your code into context, this is catastrophic. With a 2k window, Aider sees the first ~1,500 tokens of your repo (maybe 2-3 small files), then everything else is silently dropped. The model has no idea your project has 50 more files. Aider has no way to know its context is being truncated. The output looks plausible but is operating on a tiny fragment of the actual codebase.

The fix is to set OLLAMA_CONTEXT_LENGTH before starting Ollama. The official Aider docs are explicit about this. Stop Ollama if it’s running:

# Linux/macOS
pkill ollama
# Or on systemd-based Linux:
sudo systemctl stop ollama

Restart with a larger context:

OLLAMA_CONTEXT_LENGTH=16384 ollama serve

16k is a reasonable floor for serious Aider work. If your codebase is large, push it higher (32k or 64k if your model supports it—qwen2.5-coder supports up to 128k context, but every additional token costs VRAM). Aider itself adjusts the context window per request to fit your prompt plus 8k for the reply, so setting OLLAMA_CONTEXT_LENGTH higher gives Aider room to work.

For a permanent fix on Linux with systemd, edit /etc/systemd/system/ollama.service.d/override.conf:

[Service]
Environment="OLLAMA_CONTEXT_LENGTH=16384"

Then sudo systemctl daemon-reload && sudo systemctl restart ollama. The setting now persists across reboots.

Step 3: Install Aider

The cleanest install path is via aider-install, which handles its own Python environment so it doesn’t conflict with project dependencies:

python -m pip install aider-install
aider-install

Python 3.8 through 3.13 is supported with this installer. The traditional pip install works too if you want manual control:

pip install -U --upgrade-strategy only-if-needed aider-chat

For pip install, Python 3.9–3.12 is the supported range.

Verify:

aider --help

If you get aider: command not found, your Python user-bin directory isn’t on PATH. The workaround is python -m aider instead of aider.

Step 4: Connect Aider to Ollama

Two ways: environment variable or config file.

Quick path: environment variable

Linux/macOS:

export OLLAMA_API_BASE=http://127.0.0.1:11434
cd ~/projects/my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

Windows (PowerShell):

$env:OLLAMA_API_BASE = "http://127.0.0.1:11434"
cd $HOME\projects\my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

Important: use ollama_chat/<model>, not ollama/<model>. The _chat suffix routes through the Ollama chat endpoint, which gives proper instruction-following. Without it, Aider talks to the base completion endpoint and the model behaves like it’s autocompleting tokens rather than responding to a prompt.

Persistent path: config file

Drop a .aider.conf.yml in your home directory or repo root:

model: ollama_chat/qwen2.5-coder:14b
openai-api-base: http://127.0.0.1:11434
weak-model: ollama_chat/qwen2.5-coder:7b
auto-commits: true
dirty-commits: true

The weak-model field is what Aider uses for cheap operations like commit message generation. Pointing it at a smaller, faster model saves seconds per interaction.

Step 5: The Workflow That Actually Works

A first session typically goes:

cd ~/projects/my-codebase
aider --model ollama_chat/qwen2.5-coder:14b

Aider scans your repo’s git history, lists files, and gives you a prompt. From here:

/add path/to/file.py — explicitly add a file to context. Aider only sees what’s added; this is intentional to control token usage.
/drop path/to/file.py — remove a file from context once you’re done with it.
/diff — show pending changes before Aider commits.
Plain prompt — describe what you want. Example: “Refactor the auth middleware to use bcrypt instead of MD5.”
/undo — roll back the last Aider commit if the change was wrong.
Ctrl+C — stop a running generation.

Aider auto-commits each accepted change with a sensible commit message. You can disable this with --no-auto-commits if you want manual control. The auto-commit pattern is what makes Aider feel different from Cursor or Copilot—every change is a real git commit with a real message, not an ephemeral edit in a chat window.

Practical performance notes

Local LLM inference on a single RTX 4090 with a 14B coder model: roughly 30-50 tokens/sec, depending on context fill. A typical Aider response (200-400 tokens) returns in 5-15 seconds. That’s slower than Cursor on a cloud frontier model, but fast enough that it doesn’t break flow once you’re used to the cadence.

If responses feel sluggish: check nvidia-smi while Aider is generating. If GPU utilization is below 90%, the model isn’t fitting cleanly and is offloading to CPU. Either pick a smaller quantization (qwen2.5-coder:14b-instruct-q4_K_M instead of the default) or move to a model size that fits comfortably.

Honest Take

Aider + Ollama is the best zero-cost setup in 2026 for a specific kind of work: small to medium refactors, file-scoped edits, learning a codebase by asking questions about it. The 14B model is genuinely useful for these.

It is not a replacement for Cursor with Claude Sonnet 4.6 or Pro+ on a complex multi-file feature. The model gap is real. Local 14B-32B coder models in 2026 are roughly comparable to GPT-4 from 2023 in capability—great for circumscribed tasks, mediocre at the kind of multi-file reasoning that frontier cloud models handle in one shot.

The right framing is: Aider+Ollama is your default daily driver for everyday work, and Cursor or Copilot is your escalation path for tasks where the local model visibly struggles. The total-cost math we covered in Cloud AI Coding vs Local LLM in 2026 plays out exactly the same here: hybrid is the practical answer for most working devs.

For a fuller review of Aider as a tool (not specifically the local-LLM angle), see our Aider Review 2026. If you want a VS Code-native alternative with multi-language project support, Continue.dev configuration guide for multi-language projects covers the equivalent setup. If you’re choosing between Aider and other free alternatives, the 30-Day Test of Free Cursor Alternatives puts it in context with Cline, Continue, and the others. For getting the most out of Cursor at $20/month if you decide local isn’t enough yet, the Cursor IDE Review 2026 covers the trade-offs.

If you’re not sure your hardware fits, our sister site’s Cursor + Local Llama hardware tier guide covers what each VRAM tier actually delivers in terms of latency and usability. The same analysis applies directly to Aider.

1V1 PLAYBOOK · LOCAL LLM

Cut your local AI bill from $400/month cloud GPU to $47/month at home.

4-path hardware decision table, Ollama cold-start fix, Cursor/Claude Code routing configs, full 24-month TCO calculator.

Get it for $19 (early bird) →

STARTER KIT · CLAUDE CODE & CURSOR

Stop configuring from scratch. Get 6 production-ready stacks.

6 CLAUDE.md/.cursorrules templates (Next.js, Python, Go, Rust, Monorepo, Generic), 4 subagents, 4 slash commands, 3 hook recipes, MCP setups. Drop in and start coding.

Get the kit — $9 launch price →

Sources

Last updated May 11, 2026. Ollama defaults and model lineup change frequently—verify against the upstream docs before troubleshooting unexpected behavior.

Recommended Gear

The hardware mentioned in this guide, with current prices on Amazon (affiliate links — at no extra cost to you, purchases help support this site):

Was this article helpful?