local llms · cli + vs code

local llms for daily coding. private, powerful, yours.

cloud coding assistants meter every token and ship your source to someone else's server. oi runs a real coding model on your own machine — through ollama or llama.cpp — from a cli and a vs code extension. no cloud, no per-token bill, no code leaving your laptop.

get oi see how it works

runs offline
your model, your version
free cli + extension

oi — visual studio code

local

validators.tstypescript

1const SLUG = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;

3export function isSlug(s: string) {

4return SLUG.test(s);

7// highlighted by oi

oirunning locally · codellama:13b

explain this regex on line 1

it matches a lowercase url slug: one or more groups of a-z0-9 separated by single hyphens. anchored start to end, so my-post-2 passes and -x- fails.

ask about your code…

why local

three reasons developers want their assistant off the cloud.

the same wish keeps surfacing — a hacker news thread asking “can i swap claude/gpt for a local model as my main coding tool?” drew over a thousand upvotes. oi is the answer to each part of it.

the per-token bill

cloud assistants meter every keystroke. a busy week of refactors quietly turns into a real invoice.

oi runs the model on your own hardware. you pay for electricity, not tokens — $0 per request, forever.

your code leaves the building

every prompt ships your proprietary source to a third party. for regulated or sensitive work that's a non-starter.

nothing leaves your machine. prompts, code and completions all stay on localhost — provably private.

no control, no offline

models change under you, rate limits bite, and a flaky connection means no assistant at all.

you pick the model and the version. it works on a plane, in an air-gapped network, anywhere — pinned and yours.

in your editor

ask your code anything — answered locally.

the vs code extension docks a chat panel beside your file. explain a regex, draft a function, refactor a block — every turn runs against the model on your own machine, with a status pill that tells you exactly which one.

selection-aware: it sees the file and the lines you highlight
switch models per project from the panel header
completions stream straight from localhost — no round trip

oi — visual studio code

validators.tstypescript

1const SLUG = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;

3export function isSlug(s: string) {

4return SLUG.test(s);

7// highlighted by oi

oirunning locally · codellama:13b

explain this regex on line 1

it matches a lowercase url slug: one or more groups of a-z0-9 separated by single hyphens. anchored start to end, so my-post-2 passes and -x- fails.

ask about your code…

oi — source control

source control3 staged

message

feat(auth): add slug validation + 422 on bad input

validate route params against the slug regex and reject malformed requests early instead of failing in the handler.

drafted by oi from your staged diff · running locally · qwen2.5-coder:7b

+ src/lib/validators.ts

~ src/routes/posts.ts

~ src/routes/users.ts

commit messages

a real commit message from your staged diff.

stage your changes and oi reads the diff to draft a conventional-commit message — subject and body — that you can tweak and commit. the diff never leaves your repo.

conventional-commit style, scoped to what changed
drafts from the diff, not a generic template
run it from the cli too: oi commit

one click away

quick actions, a private badge, your chosen model.

the toolbar popup keeps the model selector and the actions you reach for most — chat, generate, explain — within a click. the “local · private” badge is always in view, so you never wonder where your code is going.

pick any pulled model from the dropdown
chat, generate, or explain in one click
works fully offline — the badge proves it

oi local · private

model

codellama:13b

works fully offline · $0 per token

1303

developers upvoted “can i swap claude/gpt for a local model?” on hacker news

per token — the model runs on hardware you already own, so requests are free

100%

on-device — prompts, code and output never leave localhost

questions

the things people ask before switching.

which models can i run?

any model your local runtime can serve — codellama, qwen2.5-coder, deepseek-coder, llama 3, mistral and more. oi detects what you have and lets you switch per project.

does it need an internet connection?

no. once the model is pulled, oi runs fully offline — on a plane, in an air-gapped network, or just when your wifi drops. nothing phones home.

is it really free?

the cli and vs code extension are free, and there is no per-token bill because the model runs on your hardware. you only pay for the electricity and the gpu you already own.

ollama or llama.cpp — which should i use?

either. ollama is the easiest start (one install, a model registry); llama.cpp gives you finer control over quantization and gpu offload. oi setup auto-detects whichever you have.

does my code ever leave my machine?

never. prompts, your files and the model's output all stay on localhost. there is no oi server in the loop — that's the whole point.

which editors are supported?

vs code via the extension, plus any terminal through the oi cli. the cli also powers commit-message generation and one-off prompts from your shell.

keep your code on your machine.

install the cli, run oi setup, and your local model is wired into vs code in about a minute. free, private, and offline-ready.

get oi see pricing