Continue.dev Configuration Guide for Multi-Language Projects (2026)

continue-devsetup-guideconfigurationlocal-llmvscodemulti-language

Continue.dev sits at an interesting crossroads in 2026. With over 33,000 GitHub stars and active releases into v1.5+, the open-source AI coding assistant is still the most configurable IDE extension for VS Code and JetBrains — and it has expanded, not retreated. The main extension now runs chat, agent mode, inline edits, and tab autocomplete. In parallel, the team has launched a separate PR-enforcement product. This guide is about the IDE extension side, which is exactly what developers building multi-language projects need to configure correctly.

The core problem the extension solves: most developers either use it with a single model and its default settings, or they give up after the config file looks intimidating. Both outcomes leave significant capability on the table. A properly configured Continue.dev setup — with a fast local autocomplete model, a smarter cloud model for chat, language-specific rules, and the right context providers active — is a meaningfully different product from the out-of-the-box install.

The 2026 Config Format: config.yaml

Continue.dev has standardized on config.yaml as the primary configuration format. The older config.json still works but is deprecated. The YAML file lives at:

  • macOS/Linux: ~/.continue/config.yaml
  • Windows: %USERPROFILE%\.continue\config.yaml

Project-level overrides go in .continue/config.yaml (or .continuerc.json) at the repo root. These are layered on top of the global config — useful when you want different model assignments or stricter rules in a specific codebase.

Every valid config.yaml needs three required fields at the top:

name: My Dev Config
version: 1.0.0
schema: v1

Everything else is optional, but you won’t get much out of the extension without filling in the models section.

Multi-Model Setup: The Core Architecture

The single most impactful configuration decision is assigning different models to different roles. Continue supports five roles:

RoleWhat it does
chatThe Chat panel — questions, explanations, planning
editInline edits on selected code (Cmd/Ctrl+I)
applyApplies suggested diffs to your file
autocompleteTab autocomplete while you type
embedGenerates embeddings for codebase indexing

The practical move for most setups: use a fast, small local model for autocomplete (latency matters — you don’t want 2-second suggestions), and a more capable cloud model for chat and edit where you can tolerate a 1–3 second response. Here is a full working example:

name: Multi-Language Dev Config
version: 1.0.0
schema: v1

models:
  # Fast local model for tab autocomplete
  - name: Qwen Coder 1.5B (Autocomplete)
    provider: ollama
    model: qwen2.5-coder:1.5b
    apiBase: http://localhost:11434
    roles:
      - autocomplete
    autocompleteOptions:
      debounceDelay: 250
      maxPromptTokens: 1024
      onlyMyCode: true

  # Local heavyweight for offline chat
  - name: DeepSeek R1 32B (Local Chat)
    provider: ollama
    model: deepseek-r1:32b
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit

  # Cloud model for complex tasks
  - name: Claude Sonnet (Cloud Chat)
    provider: anthropic
    model: claude-sonnet-4-5
    roles:
      - chat
      - edit
      - apply

You can have multiple models with the same role — Continue lets you switch between them in the chat panel via a model selector dropdown. The first model listed for a role is the default.

Connecting to Ollama vs Anthropic vs OpenAI

Ollama (local): Ensure ollama serve is running before launching VS Code. The apiBase defaults to http://localhost:11434 — you only need to specify it explicitly if your Ollama instance is on a different machine or port. Pull models first with ollama pull qwen2.5-coder:1.5b before referencing them in config; Continue won’t pull them automatically. If you get a “model not found” error, the tag mismatch is almost always the cause — use exact tags (qwen2.5-coder:1.5b, not qwen2.5-coder).

- name: Llama 3.1 8B
  provider: ollama
  model: llama3.1:8b
  apiBase: http://localhost:11434
  roles:
    - chat

Anthropic: Set your API key as an environment variable (ANTHROPIC_API_KEY) or enter it in the Continue extension settings UI. Do not hardcode it in config.yaml if the file is version-controlled:

- name: Claude Sonnet
  provider: anthropic
  model: claude-sonnet-4-5
  roles:
    - chat
    - edit
    - apply

OpenAI: Same pattern with OPENAI_API_KEY. OpenAI-compatible providers (like a self-hosted llama.cpp server or LM Studio) use provider: openai with a custom apiBase:

- name: LM Studio Local
  provider: openai
  model: lmstudio-community/qwen2.5-coder-14b
  apiBase: http://localhost:1234/v1
  roles:
    - chat
    - edit

For local model selection by VRAM tier, the sister site’s Best Local AI Models by VRAM guide covers which models actually fit on what hardware — relevant when you’re deciding between the 1.5B, 7B, and 14B Qwen Coder variants.

Language-Specific Configuration with Rules

Continue’s rules system is where multi-language projects get genuinely useful. Rules are Markdown files stored in .continue/rules/ at your project root, loaded automatically when the extension starts. They are appended to the system prompt for chat, agent, and edit operations.

The key feature for polyglot codebases is glob-based activation in the rule frontmatter. A Python rule that fires when you’re editing a TypeScript file is noise. Glob patterns fix that:

.continue/rules/01-python.md:

---
globs: "**/*.py"
---

# Python Project Rules

- Python 3.12+. Type hints on all function signatures — no bare `def foo(x)`.
- Formatter: Black with line length 88. Do not suggest changes that exceed this.
- Linter: mypy in strict mode. Generated code must pass `mypy --strict`.
- Prefer `pathlib.Path` over `os.path`. Use `asyncio` for I/O-bound operations.
- Pydantic v2 for data validation. No raw dicts for structured data.

.continue/rules/02-typescript.md:

---
globs: "**/*.{ts,tsx}"
---

# TypeScript Project Rules

- Strict mode enabled. No `any` types — use `unknown` and narrow explicitly.
- No `!` non-null assertions without a comment explaining why it's safe.
- React components are function components only. No class components.
- Tailwind CSS for styling. No inline style objects unless absolutely necessary.
- All async functions return explicit Promise types in signatures.

.continue/rules/03-go.md:

---
globs: "**/*.go"
---

# Go Project Rules

- gofmt compliance required. No suggestions that break gofmt formatting.
- Error handling: always check errors, never use `_` to discard them silently.
- Use context.Context as the first parameter of any function that does I/O.
- Prefer table-driven tests in `*_test.go` files.
- Imports: stdlib first, then third-party, then internal — with blank line separation.

.continue/rules/00-general.md (always-on):

# General Rules

- Do not suggest code that introduces new external dependencies without noting it.
- Prefer modifying existing functions over creating new ones when the scope is small.
- Comment only what the code cannot say itself. No comment-for-comment's-sake.

The lexicographic 00-, 01-, 02- prefix ordering is conventional — it makes the loading order predictable and makes the general rules load first.

Context Providers: @codebase, @docs, @git

Context providers are the @ commands you type in the chat panel. They pull additional information into the context window of your request. Configure them in the context section of config.yaml:

context:
  - provider: code
  - provider: diff
  - provider: terminal
  - provider: folder
  - provider: codebase
  - provider: docs
  - provider: problems

@codebase: Continue indexes your project using embeddings (transformers.js locally by default, stored in ~/.continue/index). When you type @codebase what does the authentication module do?, it does a semantic search across your indexed files and pulls the most relevant snippets into context. This is how you ask questions about files you didn’t explicitly open. The indexing happens in the background on first load; large repos (100k+ lines) take a few minutes. .continueignore (same format as .gitignore) controls what gets indexed — add node_modules/, dist/, *.lock to keep the index clean and lean.

@docs: Reference external documentation. Add URLs to the docs section of config.yaml:

docs:
  - name: FastAPI Docs
    startUrl: https://fastapi.tiangolo.com/
  - name: React Docs
    startUrl: https://react.dev/reference/react

Once crawled, you can type @docs FastAPI how do I add middleware? and Continue pulls from the indexed docs rather than the model’s training data. Most useful for internal wikis, private APIs, or frameworks that the model knows poorly.

@diff: Pulls the current git diff into context. The canonical use: after a long refactor session, type @diff explain the changes I've made and identify anything risky. It sends only the changed lines, not the whole codebase — keeps the context window focused.

@terminal: Injects the current terminal output into context. Useful immediately after a test failure: paste nothing, just type @terminal why did this fail?.

@problems: Pulls the current VS Code Problems panel into context — all TypeScript errors, linter warnings, etc. Type @problems fix all of these and Continue has the full error list to work from.

Tab Autocomplete: Speed vs Quality Tradeoff

The autocomplete model choice is where people make the wrong call most often. General-purpose large models (Llama 3.1 70B, Claude Sonnet) are bad autocomplete models — not because they’re less capable, but because autocomplete requires sub-300ms response time. A 70B model on a local machine at 10 tokens/second is useless for completion. The winning autocomplete models are small, fast, and trained specifically on fill-in-the-middle (FIM) tasks:

ModelVRAM (Q4)Tokens/sec (RTX 4070)Best for
qwen2.5-coder:1.5b~1 GB80–100Speed-first, lower accuracy
qwen2.5-coder:7b~5 GB20–40Balanced — recommended default
starcoder2:7b~5 GB20–35Good for multilingual repos
Codestral (cloud)N/AN/ABest accuracy, adds latency

The Continue docs explicitly note: “a huge model is not required for great autocomplete. Most state-of-the-art autocomplete models are no more than 10B parameters, and increasing beyond this does not significantly improve performance.” The qwen2.5-coder:7b is the practical sweet spot for most hardware — fast enough to feel responsive, accurate enough to be useful beyond boilerplate.

Tuning options that matter:

autocompleteOptions:
  debounceDelay: 300       # ms to wait after keystroke before firing. Default 250.
  maxPromptTokens: 1024    # Context fed to autocomplete model. Higher = smarter, slower.
  onlyMyCode: true         # Don't suggest from node_modules or dist
  maxSuffixPercentage: 0.2 # How much post-cursor code to include in context

If suggestions feel slow, reduce maxPromptTokens to 512. If they feel dumb (can’t see enough context), raise it to 2048. The latency scales roughly linearly.

Multi-Provider Setup: Local for Completions, Cloud for Chat

The production setup most developers land on after a few weeks:

models:
  # Always-on local autocomplete — no API cost
  - name: Qwen Coder 7B
    provider: ollama
    model: qwen2.5-coder:7b
    roles:
      - autocomplete
    autocompleteOptions:
      debounceDelay: 300
      maxPromptTokens: 1024

  # Local model for offline work / privacy-sensitive code
  - name: DeepSeek R1 14B (Local)
    provider: ollama
    model: deepseek-r1:14b
    roles:
      - chat
      - edit

  # Cloud escalation for hard problems
  - name: Claude Sonnet (Cloud)
    provider: anthropic
    model: claude-sonnet-4-5
    roles:
      - chat
      - edit
      - apply

The split rationale: autocomplete fires hundreds of times per hour. Running that through a cloud API would generate a non-trivial bill and add round-trip latency. Local small model handles it at zero marginal cost. Chat fires maybe 20–50 times per hour — you want the best model for that. Switch between the local and cloud chat models in the dropdown based on whether you need maximum quality or offline operation.

For teams with NDA-sensitive code, see the detailed privacy-first approach in the Cline local LLM setup guide — the same Ollama-based architecture applies.

Custom Slash Commands (Prompts)

Continue’s prompt system replaces the old slashCommands array in config.json. Custom prompts live in .continue/prompts/ as Markdown files with YAML frontmatter. Setting invokable: true makes them appear when you type / in the chat panel:

.continue/prompts/write-tests.md:

---
name: Write Tests
description: Generate unit tests for the selected code
invokable: true
---

Write comprehensive unit tests for this code. Follow the testing conventions in this project:
- Match the test framework already used (pytest, jest, go test — detect from imports)
- Use table-driven tests for Go, parametrize for Python
- Cover happy path, edge cases, and one error case
- Do not add test helper libraries that aren't already in the project

.continue/prompts/explain-diff.md:

---
name: Explain Diff
description: Summarize and risk-assess the current git diff
invokable: true
---

Look at the current changes (@diff) and:
1. Summarize what was changed in 2-3 sentences
2. List any lines that could introduce bugs or regressions
3. Call out any missing error handling or edge cases

You can also add shared prompts from the Continue Hub directly in config.yaml:

prompts:
  - uses: org/prompt-name
  - name: Local Prompt
    description: Local custom workflow
    prompt: Your inline prompt text here

Honest Take: Where Continue.dev Beats Cursor, Where It Loses

Continue.dev wins on three specific dimensions:

Model flexibility and cost. Cursor is a VS Code fork with a credit-based billing system ($20/month base, plus variable credits per operation). Continue is free, open-source (Apache 2.0), and every API call goes directly from your machine to whatever provider you configure. No per-seat pricing, no subscription. For solo developers or small teams already managing cloud API keys, the total cost of ownership is dramatically lower. The Qwen 7B autocomplete model costs you electricity.

Privacy and data residency. Cursor routes your code through Cursor’s servers. Continue.dev does not — your prompts go directly to the endpoint you configure. With a local Ollama setup, zero bytes leave your machine. This matters for client code under NDA, proprietary algorithms, and pre-launch product code.

Configurability for team standards. The rules system — glob-activated per file type, version-controlled in .continue/rules/ — is more powerful than Cursor’s .cursorules file for polyglot projects. You can activate different instruction sets per language without the model seeing irrelevant rules.

Cursor wins on:

Out-of-box quality. The default Cursor experience requires no configuration. Open the app, start coding, get suggestions. Continue.dev requires meaningful setup time before you’re at the same baseline. If you want something that works in five minutes, Cursor (or GitHub Copilot — see our Cursor review) wins.

Agent mode maturity. Cursor’s Composer/Agent mode for multi-file, multi-step tasks is more polished than Continue’s agent mode as of mid-2026. For complex refactors that span 10+ files, Cursor completes them more reliably with fewer course-corrections required.

Cross-file intelligence. Cursor’s codebase understanding (its proprietary indexing) tends to surface more relevant context automatically, without the user needing to explicitly type @codebase and know to invoke it.

The honest verdict for a multi-language project where privacy or cost matters: Configure Continue.dev properly with the setup above and it is a serious tool, not a Cursor consolation prize. The upfront config time is 30–60 minutes. Once done, the daily experience is close. For teams that want zero configuration and are fine with the billing model, Cursor or Copilot remain the easier choice. For the Aider users who want a GUI alongside terminal-based workflows, see the Aider + Ollama guide — many developers run both.

Quick-Start Checklist

  1. Install the Continue extension from the VS Code marketplace (search “Continue”)
  2. Open Command Palette → “Continue: Open Config File”
  3. Replace the default content with your config.yaml structure
  4. Run ollama pull qwen2.5-coder:7b for autocomplete (and qwen2.5-coder:1.5b if you want the ultra-fast variant)
  5. Add your ANTHROPIC_API_KEY or OPENAI_API_KEY to your shell profile if using cloud models
  6. Create .continue/rules/ with per-language .md files
  7. Open a project file and test autocomplete — if nothing appears within 3 seconds, check that ollama serve is running
  8. Index the codebase: open the Continue panel, click the database icon, start indexing

The extension installs in 30 seconds. The configuration that makes it genuinely useful takes closer to an hour the first time, and 10 minutes when copying an existing config to a new project.


Sources

Last verified: May 13, 2026

Was this article helpful?