⚡Automationbeginner

LM Studio Desktop Quick Start, Local LLMs With A GUI

LM Studio on Mac and Windows for the GUI-first crowd, my install plus the API setup that beat my Ollama habit

Aditya Sharma·May 6, 2026·6 min read

LM Studio desktop showing model picker and chat

LM Studio is the desktop GUI for local LLMs that finally works for the non-CLI crowd. I have been recommending it to friends who want to try local AI without learning the command line, and it has held up. The model browser is integrated, downloads are one-click, and the local API runs on the same port as Ollama with the same OpenAI-compatible interface. This is the install I run on the MacBook.

What you'll build

LM Studio installed on Mac (or Windows), a local model downloaded and chatting, and the local OpenAI-compatible API running so you can point Python scripts at it. Roughly 15 minutes on a fast connection.

LM Studio chatting with Qwen on my MacBook Caption: LM Studio with Qwen 2.5 7B loaded and chat history visible.

Prerequisites

Mac (Apple Silicon recommended, Intel works), Windows 10/11, or Linux x86_64 with AppImage
16GB RAM minimum (8GB for tiny models only)
10GB free disk for the app plus a few models
A reasonable connection for the model download

LM Studio's strength is the GUI; if you live in a terminal, Ollama is the better fit.

Step 1, install LM Studio

Download from lmstudio.ai. The Mac download is a stock dmg; drag to Applications. Windows is an MSI; standard install. Linux is an AppImage; chmod +x and run.

# Linux AppImage path
chmod +x LMStudio-0.3.5.AppImage
./LMStudio-0.3.5.AppImage

LM Studio first launch

The first launch shows a model browser with curated picks. The defaults are sensible for first-time users.

Step 2, download a model

In the Discover tab, search "Qwen 2.5 7B". The list shows multiple quantizations. Pick Qwen2.5-7B-Instruct-Q4_K_M.gguf for the balanced default, click Download.

Model download in progress

LM Studio downloads to ~/.cache/lm-studio/models/. Progress shows in the GUI; you can keep using the app while a download is running.

Step 3, load and chat

In the Chat tab, click "Select a model to load" at the top. Pick the downloaded Qwen 2.5 7B. Type a prompt:

What's a closure in Python? Two sentences.

LM Studio chat output

The first prompt has a 5-10 second cold start. Subsequent prompts in the same session stream tokens as expected.

Step 4, enable the local server

In the Developer tab (the brackets icon on the left sidebar), toggle "Start Server". By default it binds to localhost:1234 with the OpenAI-compatible API path.

LM Studio server toggled on

You can pick a different port, change the model the server uses, and see request logs in the GUI.

Step 5, call from Python

from openai import OpenAI

client = OpenAI(
    api_key="lm-studio",
    base_url="http://localhost:1234/v1",
)

resp = client.chat.completions.create(
    model="qwen2.5-7b-instruct",
    messages=[
        {"role": "user", "content": "Three bullet points on REST vs GraphQL."}
    ],
)
print(resp.choices[0].message.content)

Python client output

The model identifier passed to the API matches the model name as shown in the LM Studio Chat picker.

First run

A typical day-of-use workflow:

[LM Studio chat for ad-hoc prompts: rewrite, summarise, brainstorm]

[Server toggled on for the day]

[Python scripts in another terminal call localhost:1234]

[Switch to Discover tab to try a new model variant when curious]

End-to-end LM Studio workflow

The GUI plus the API together cover both casual and programmatic use without switching tools.

What broke for me

Two issues, both Apple-Silicon specific. First, the M1 MacBook Air with 16GB RAM hit memory pressure with Qwen 2.5 7B at Q5_K_M plus my normal browser tabs. The fix was dropping to Q4_K_M (4.4GB instead of 5.4GB) and the experience smoothed out. The hint was watching Activity Monitor; if the swap usage was climbing during inference, the model was too big for the box.

Second, LM Studio's local server on port 1234 conflicted with a stale process I had from a previous OrbStack experiment. The error in LM Studio was a vague "could not bind"; I had to lsof -i :1234 to find the conflict. Killing the old process freed the port. LM Studio could pick a different port; the default-conflict bit me.

What it costs

Item	Cost
LM Studio	Free for personal use
LM Studio Business	Custom (commercial use needs a licence)
Models	Free (per-model licence)
Disk (per model)	2-8GB
Electricity	Standard

For commercial use inside a company, check the LM Studio licence; the personal-use tier is generous but business deployment needs a paid licence.

When NOT to use this

Skip LM Studio if you live in a terminal. Ollama covers the same use cases with less app overhead.

Skip if you need a server that runs without a GUI process attached. The LM Studio server requires the GUI app to be running; for a headless server, llama.cpp's llama-server or Ollama's systemd unit is the right shape.

Indian operator angle

LM Studio is the local-LLM tool I recommend to friends who do not write code. The GUI lowers the entry barrier enough that someone running a small consultancy can try Qwen for their internal docs without setting up Python or learning the CLI. For a Tier-2 city studio with mixed technical levels, that lowering matters.

The licence question matters for Indian SaaS shops. The personal-use tier is free; the moment you embed LM Studio in a product or use it in a paid client engagement, you need the commercial licence. For shipping inference inside your own product, llama.cpp directly is the cleaner licensing path.

Topics

#LM Studio #Local LLM #Desktop #GUI

More Automation

Cloudflare Workers AI dashboard with model usage

⚡Automationintermediate

Cloudflare Workers AI, Edge Inference Without Your Own GPU

Workers AI runs Llama, Mistral, and Stable Diffusion at Cloudflare's edge. I tried it for a low-latency demo. This is the setup, with the rate-limit gotcha that bit me.

May 6, 2026·7 min read

Coolify dashboard managing apps on Oracle ARM VM

⚡Automationadvanced

Coolify Deploy LLM App On Oracle ARM, Free Forever

Coolify is the self-hosted PaaS I use across the empire. Paired with Oracle ARM's free tier, it deploys Node, Python, and Go LLM apps at zero monthly cost. This is the install.

May 6, 2026·7 min read

CrewAI logs showing multi-agent collaboration

⚡Automationadvanced

CrewAI Multi-Agent Orchestration, A Real Workflow That Shipped

CrewAI is the most popular multi-agent orchestration framework. I built a real research crew with it. This is the install, the workflow, and the gotcha that ate my afternoon.

May 6, 2026·7 min read

What you'll build

Prerequisites

Step 1, install LM Studio

Step 2, download a model

Step 3, load and chat

Step 4, enable the local server

Step 5, call from Python

First run

What broke for me

What it costs

When NOT to use this

Indian operator angle

Related

More Automation

Cloudflare Workers AI, Edge Inference Without Your Own GPU

Coolify Deploy LLM App On Oracle ARM, Free Forever

CrewAI Multi-Agent Orchestration, A Real Workflow That Shipped