Cline VS Code Extension With Local LLM, Free Agentic Coding
Cline pointed at Ollama-hosted Qwen 2.5, the setup that gave me a free coding agent on my Linux box

import APIPriceLive from "@/components/data/APIPriceLive";
Cline started life as a fork of Continue and grew teeth. It is a VS Code extension that turns any chat-completion endpoint, including a local Ollama, into a planning agent that reads files, writes diffs, and runs commands. I wired Cline to my local Qwen 2.5 7B running through Ollama on the ThinkCentre and ran a normal coding week against it. No API bill, no rate limits, and surprisingly few moments where the model was the bottleneck.
What you'll build
VS Code with Cline installed, Ollama serving Qwen 2.5 7B locally, and a working agent that can plan, edit, and verify code without any cloud call. Roughly 30 minutes if Ollama is fresh, 10 minutes if you already run it.
Caption: Cline panel running a refactor through local Qwen 2.5 7B.
Prerequisites
- VS Code 1.95+ on Linux, Mac, or Windows
- 16GB RAM (Qwen 2.5 7B at 4-bit quant fits in ~6GB; the rest is for VS Code, Chrome, and your work)
- Ollama installed (5-minute install)
- A small project to test against. Cline is happiest in real codebases, not blank scratch dirs.
If you only have 8GB RAM, drop to Qwen 2.5 3B or Phi-3 mini. The agent quality drops a step but the workflow still works.
Step 1, install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama --version

The install script writes to /usr/local/bin/ollama and starts a systemd unit listening on localhost:11434. Verify with systemctl status ollama on Linux.
Step 2, pull Qwen 2.5 7B
ollama pull qwen2.5:7b
ollama list

The 7B model at default quantization is 4.4GB. On my 50Mbps Jio fibre line it pulled in about 14 minutes. The Ollama registry is in the US, your real-world download depends on local routing. If you see less than 1MB/s sustained, you are getting throttled by the CDN, not the model registry.
Step 3, install Cline
In VS Code, open the Extensions panel (Ctrl+Shift+X), search "Cline", install the one with the most recent updates and the highest install count. As of writing, the right one is saoudrizwan.claude-dev (Cline kept the old internal name post-rename).

Open the Cline panel from the sidebar. The first prompt asks for a model provider.
Step 4, point Cline at local Ollama
In the Cline settings panel, choose "Ollama" as the API provider, set the endpoint to http://localhost:11434, and the model to qwen2.5:7b. Save.

The first message you send to Cline will warm up the model and may take 8-12 seconds before tokens start streaming. Subsequent messages are faster because the model stays loaded in RAM.
Step 5, give it a real task
The way to test if your local-LLM coding agent is real or theatre is to give it a small but non-trivial task in your real codebase. Mine was:
Read src/lib/news-loader.ts. The function loadNews returns a promise of NewsArticle[] but does not handle the case where the source MDX has invalid frontmatter. Add a try-catch around the parse step, log a warning, and return the article without the broken frontmatter intact. Do not change the function signature.

Cline shows you the diff before applying. Approve, reject, or edit each chunk.
First run
A typical Cline session against local Qwen looks like:
You: list the components in src/components and tell me which ones are over 200 lines
Cline: [reads src/components/, returns]
SearchBox.tsx (87 lines)
NewsCard.tsx (143 lines)
HeroSection.tsx (228 lines) <-- over 200
Footer.tsx (94 lines)
You: split HeroSection into HeroTitle, HeroVisual, HeroCTA
Cline: Plan:
1. Create src/components/hero/HeroTitle.tsx
2. Create src/components/hero/HeroVisual.tsx
3. Create src/components/hero/HeroCTA.tsx
4. Update HeroSection.tsx to import and compose
Proceed? [Approve / Reject]

Approve and Cline writes the files. It does not run the typecheck unless you ask. Add it to your initial prompt.
What broke for me
Two specifics. First, Qwen 2.5 7B at default quantization wandered when planning across three or more files. It would lose track of which file was which by the third tool call. The fix that worked was switching to qwen2.5:7b-instruct-q5_K_M (a less aggressive quantization, ~5.4GB), which gave noticeably better long-plan coherence. The trade-off is RAM and a small speed hit, both worth it.
Second, Cline by default wraps your context in a system prompt that name-drops Claude. Local models with no Claude training data sometimes cargo-cult that and produce confused output. I edited the Cline system prompt in settings to remove the Claude references and the local model started behaving much more like itself. The setting is buried under Cline → Advanced → System Prompt; do not skip it for local-LLM use.
What it costs
| Item | Cost |
|---|---|
| VS Code | Free |
| Cline extension | Free |
| Ollama | Free |
| Qwen 2.5 7B | Free (Apache 2.0) |
| Electricity (heavy use) | ~Rs 200/mo |
| Network | First model pull only, ~5GB |
The total bill for a free local-LLM coding setup is your electricity and your time. The trade-off is model quality. Qwen 2.5 7B is closer to Claude Haiku than Sonnet; for routine refactors it is fine, for novel architecture work you will outgrow it.
When NOT to use this
Skip Cline-on-local if your typical task involves novel cross-file design or you need the model to keep deep context across 1000+ lines. The 7B local model loses coherence past ~6k tokens of code-context. Use Cursor with Sonnet 4.6 or Claude Code for that.
Skip if you have a hard 8GB RAM ceiling. Qwen 2.5 3B works but the planning quality is a real step down. Free API tiers (Claude Haiku, Gemini Flash) are a better path.
Indian operator angle
The local-LLM stack is the most India-friendly coding setup I have used. Zero forex, zero subscription, zero data leaving the country, no UPI-versus-card friction. For a freelancer or small studio in a Tier-2 city with patchy fibre, the offline tolerance is real, no internet needed once the model is pulled.
Krutrim and Sarvam-1 also run on Ollama via their open weights; if you want an Indian-trained base, point Cline at those instead of Qwen. The agent loop is identical; only the model name in the Cline config changes.
Related
More AI Coding

Aider CLI With Claude Sonnet, A Real Pair-Programming Setup
Aider is the lightest agentic-coding CLI I have used. Pointed at Claude Sonnet 4.6, it is the right tool for legacy refactors and tightly-scoped edits. This is the install I run.

Claude Code on Linux, Full Install With Screenshots From My ThinkCentre
I installed Claude Code on a stock Ubuntu 24.04 box, set up Pro OAuth, and shipped a real refactor with it. This is the install that worked, including the bits the official docs skip.

Codex CLI Install And First Task, OpenAI's Terminal Coding Agent
Codex CLI is OpenAI's answer to Claude Code. I ran it for a week against GPT-5 Codex on a real Next.js refactor. This is the install and the workflow that landed for me.