AutoKaam Playbook

Gemma, the Open Family I Actually Reach For

Google's open-weight line. The 2B is my Pi default; the 9B is my desktop default; vision is the one I run daily.

Last reviewed: 2026-05-06

The operator take

I run Gemma every day. The empire's gemma-vision MCP wraps Gemma-2:2b plus the vision projector, and that pipeline handles every screenshot OCR pass I do. Caption a screenshot, extract data from a dashboard, read a PDF page that other tools choked on, all of it goes through Gemma. After two months of daily use I trust it for grunt extraction the way I once trusted nothing local.

The 2B variant is my Pi 4B default. With 4K context it fits comfortably in 8GB RAM, runs at usable speed, and the quality is good enough for caption-level tasks. I would not ask it to write a tutorial, but for "what is in this image" or "extract these fields from this receipt" it is fine. The 9B variant is my desktop default for anything where I want a bit more reasoning depth. On my M75q with 32GB it loads at Q4 and runs at honest tokens per second, again not Sonnet-quality but more than usable.

What I learned the hard way is that the 9B does not fit on the Pi. I tried it once during a power-out when I wanted to keep the Telegram bot running on the Pi alone, the model spilled to swap inside thirty seconds, the box went unresponsive, I had to power-cycle. Lesson, do not be greedy with model size on edge hardware, the e2b variants exist for a reason.

The other thing Google got right is the licensing. Gemma is genuinely open-weight with a permissive use policy that allows commercial deployment without the per-call royalty math you have to do with some other "open" models. For empire use cases that ship to AdSense-monetized properties or customer-facing tools, this matters. The license risk on Gemma is functionally zero in 2026.

Vision is where Gemma earns my real respect. The Gemma-2 vision variant is what makes the empire's screenshot-to-text pipeline work without paying Anthropic for every caption. I have measured it on my own dataset of 800 screenshots and the accuracy is in the 95% range for clear text, the 80% range for noisy interfaces. For my workload that beats the cost-quality ratio of any hosted vision API I have tested.

Where Gemma falls behind is exactly where every open-weight model falls behind in 2026, frontier reasoning. For coding agents, deep math, long-form writing where the prose has to read right, I still go to Sonnet. Gemma is a grunt LLM, not a craft LLM, and pretending otherwise leads to bad output. The skill is knowing which task is which.

For Indian operators starting with local AI, Gemma 2B on a Pi 5 plus Ollama is genuinely the lowest-cost real-AI setup I can recommend. Total bill of about Rs 9,000 of hardware, zero ongoing cost, runs forever. The empire-pulse Telegram bot for my mother runs exactly this stack, and she has used it daily for a month with no complaints I can detect.

Why it matters in 2026

Among open-weight families that stayed actively maintained through 2026, Gemma is the one with the cleanest license, the best edge story (2B variant), and a credible vision projector. For any operator who needs local-first work and trust on commercial deployment, this is the family.

Cost in INR

Free, open weights. Compute cost is local hardware electricity, effectively zero for personal use.

Use when

+Edge or Pi deployment with the 2B variant
+Vision and OCR tasks via Gemma-vision
+Local grunt extraction where Sonnet quality is overkill
+Commercial deployment where license clarity matters

Skip when

xFrontier reasoning or long-form writing, Sonnet is right
xHeavy coding agent work, Claude or local Qwen-Coder is better
xMulti-user serving at scale, vLLM with Llama or Mistral may pair better

Alternatives I would consider

Qwen, Where Cerebras Speed Plus Open Weights Actually Compose DeepSeek Local, the Pricing Disruptor I Mostly Run Hosted Diffusers, the Image-Gen Path I Use Sparingly

Adjacent in the playbook

Free open weights. Cerebras Cloud free tier 30 RPM. Cerebras paid tier roughly Rs 50 per 1M input tokens, Rs 100 per 1M output for qwen-3-235B.

The operator take

Why it matters in 2026

Cost in INR

Use when

Skip when

Alternatives I would consider

Adjacent in the playbook

Qwen, Where Cerebras Speed Plus Open Weights Actually Compose

DeepSeek Local, the Pricing Disruptor I Mostly Run Hosted

Diffusers, the Image-Gen Path I Use Sparingly

Ollama, the Local Model Runtime I Actually Trust