LM Studio Desktop Quick Start, Local LLMs With A GUI
LM Studio on Mac and Windows for the GUI-first crowd, my install plus the API setup that beat my Ollama habit

LM Studio is the desktop GUI for local LLMs that finally works for the non-CLI crowd. I have been recommending it to friends who want to try local AI without learning the command line, and it has held up. The model browser is integrated, downloads are one-click, and the local API runs on the same port as Ollama with the same OpenAI-compatible interface. This is the install I run on the MacBook.
What you'll build
LM Studio installed on Mac (or Windows), a local model downloaded and chatting, and the local OpenAI-compatible API running so you can point Python scripts at it. Roughly 15 minutes on a fast connection.
Caption: LM Studio with Qwen 2.5 7B loaded and chat history visible.
Prerequisites
- Mac (Apple Silicon recommended, Intel works), Windows 10/11, or Linux x86_64 with AppImage
- 16GB RAM minimum (8GB for tiny models only)
- 10GB free disk for the app plus a few models
- A reasonable connection for the model download
LM Studio's strength is the GUI; if you live in a terminal, Ollama is the better fit.
Step 1, install LM Studio
Download from lmstudio.ai. The Mac download is a stock dmg; drag to Applications. Windows is an MSI; standard install. Linux is an AppImage; chmod +x and run.
# Linux AppImage path
chmod +x LMStudio-0.3.5.AppImage
./LMStudio-0.3.5.AppImage

The first launch shows a model browser with curated picks. The defaults are sensible for first-time users.
Step 2, download a model
In the Discover tab, search "Qwen 2.5 7B". The list shows multiple quantizations. Pick Qwen2.5-7B-Instruct-Q4_K_M.gguf for the balanced default, click Download.

LM Studio downloads to ~/.cache/lm-studio/models/. Progress shows in the GUI; you can keep using the app while a download is running.
Step 3, load and chat
In the Chat tab, click "Select a model to load" at the top. Pick the downloaded Qwen 2.5 7B. Type a prompt:
What's a closure in Python? Two sentences.

The first prompt has a 5-10 second cold start. Subsequent prompts in the same session stream tokens as expected.
Step 4, enable the local server
In the Developer tab (the brackets icon on the left sidebar), toggle "Start Server". By default it binds to localhost:1234 with the OpenAI-compatible API path.

You can pick a different port, change the model the server uses, and see request logs in the GUI.
Step 5, call from Python
from openai import OpenAI
client = OpenAI(
api_key="lm-studio",
base_url="http://localhost:1234/v1",
)
resp = client.chat.completions.create(
model="qwen2.5-7b-instruct",
messages=[
{"role": "user", "content": "Three bullet points on REST vs GraphQL."}
],
)
print(resp.choices[0].message.content)

The model identifier passed to the API matches the model name as shown in the LM Studio Chat picker.
First run
A typical day-of-use workflow:
[LM Studio chat for ad-hoc prompts: rewrite, summarise, brainstorm]
[Server toggled on for the day]
[Python scripts in another terminal call localhost:1234]
[Switch to Discover tab to try a new model variant when curious]

The GUI plus the API together cover both casual and programmatic use without switching tools.
What broke for me
Two issues, both Apple-Silicon specific. First, the M1 MacBook Air with 16GB RAM hit memory pressure with Qwen 2.5 7B at Q5_K_M plus my normal browser tabs. The fix was dropping to Q4_K_M (4.4GB instead of 5.4GB) and the experience smoothed out. The hint was watching Activity Monitor; if the swap usage was climbing during inference, the model was too big for the box.
Second, LM Studio's local server on port 1234 conflicted with a stale process I had from a previous OrbStack experiment. The error in LM Studio was a vague "could not bind"; I had to lsof -i :1234 to find the conflict. Killing the old process freed the port. LM Studio could pick a different port; the default-conflict bit me.
What it costs
| Item | Cost |
|---|---|
| LM Studio | Free for personal use |
| LM Studio Business | Custom (commercial use needs a licence) |
| Models | Free (per-model licence) |
| Disk (per model) | 2-8GB |
| Electricity | Standard |
For commercial use inside a company, check the LM Studio licence; the personal-use tier is generous but business deployment needs a paid licence.
When NOT to use this
Skip LM Studio if you live in a terminal. Ollama covers the same use cases with less app overhead.
Skip if you need a server that runs without a GUI process attached. The LM Studio server requires the GUI app to be running; for a headless server, llama.cpp's llama-server or Ollama's systemd unit is the right shape.
Indian operator angle
LM Studio is the local-LLM tool I recommend to friends who do not write code. The GUI lowers the entry barrier enough that someone running a small consultancy can try Qwen for their internal docs without setting up Python or learning the CLI. For a Tier-2 city studio with mixed technical levels, that lowering matters.
The licence question matters for Indian SaaS shops. The personal-use tier is free; the moment you embed LM Studio in a product or use it in a paid client engagement, you need the commercial licence. For shipping inference inside your own product, llama.cpp directly is the cleaner licensing path.
Related
More Automation

Cloudflare Workers AI, Edge Inference Without Your Own GPU
Workers AI runs Llama, Mistral, and Stable Diffusion at Cloudflare's edge. I tried it for a low-latency demo. This is the setup, with the rate-limit gotcha that bit me.

Coolify Deploy LLM App On Oracle ARM, Free Forever
Coolify is the self-hosted PaaS I use across the empire. Paired with Oracle ARM's free tier, it deploys Node, Python, and Go LLM apps at zero monthly cost. This is the install.

CrewAI Multi-Agent Orchestration, A Real Workflow That Shipped
CrewAI is the most popular multi-agent orchestration framework. I built a real research crew with it. This is the install, the workflow, and the gotcha that ate my afternoon.