how it works

a local model, wired into your editor in a minute.

oi is the thin layer between a coding llm running on your machine and the places you actually write code — your terminal and vs code. install the cli, point it at ollama or llama.cpp, and start. no account, no keys, no code leaving your laptop.

oi — terminal
~/work/api$oi setup

scanning for a local llm runtime…

✓ found ollama at http://localhost:11434

✓ detected model · codellama:13b (7.4 gb)

✓ detected model · qwen2.5-coder:7b (4.7 gb)

default model set to codellama:13b

✓ oi is ready — nothing leaves this machine.

~/work/api$oi chat "add input validation to this route"

oi generating a patch with codellama:13b

+ validate req.body.email against zod schema

+ return 422 on parse failure

apply this patch? [y/n]

the wedge

cloud assistants are powerful — and they make you give up three things.

you hand over your source, you pay by the token, and you trust someone else to keep the model stable and online. for a lot of work that trade is fine. for the rest of it — confidential code, tight budgets, offline machines — it isn't. oi gives the same daily assistant without the trade.

$0

per token — local inference on your own hardware

100%

on-device — prompts and code stay on localhost

1303

upvotes on the hacker news thread asking for exactly this

who it's for

built for the people the cloud leaves out.

indie hackers

ship without watching a token meter. a local model handles the day-to-day refactors and boilerplate at a fixed cost — your gpu — so a heavy week doesn't mean a heavy bill.

privacy-sensitive teams

for regulated, client-confidential or proprietary codebases, nothing can leave the building. oi keeps every prompt and diff on localhost, so legal and security stay happy.

offline & air-gapped

on a plane, on a secure network, or just off the grid — oi works with no connection at all once the model is pulled. the assistant is always there.

cost control

predictable spend instead of usage-based surprises. run the model you already have on hardware you already own, and your per-request cost is exactly zero.

how it works

four steps from install to coding.

1

install the cli

brew install oi, or curl the one-line installer. the cli is the engine — it talks to your local runtime and powers the editor extension.

2

run oi setup

oi scans for ollama or llama.cpp, lists the models you've pulled, and links one as your default. nothing is downloaded to a cloud — it just wires up what's already on your machine.

3

the vs code extension auto-detects it

open vs code and the extension finds your running oi setup automatically. the chat panel, quick actions and commit generation light up — no keys, no account.

4

code locally

every prompt runs against your model on localhost. ask questions, generate, refactor, draft commits — all private, all free, all offline-ready.

welcome

welcome to oi

a coding llm that runs on your machine. private, powerful, yours — no cloud, no per-token bill, no code leaving your laptop.

1

install the cli

one command — brew install oi, or curl the installer.

2

run oi setup

it finds your ollama or llama.cpp model and links it.

3

code locally

open vs code — the extension auto-detects oi and chats.

takes about a minute · works offline

oi — source control
source control3 staged

message

feat(auth): add slug validation + 422 on bad input

validate route params against the slug regex and reject malformed requests early instead of failing in the handler.

drafted by oi from your staged diff · running locally · qwen2.5-coder:7b

+ src/lib/validators.ts

~ src/routes/posts.ts

~ src/routes/users.ts

beyond chat

the model earns its keep in the boring places too.

oi reads your staged diff and drafts a real conventional-commit message — subject and body grounded in what actually changed. it's the kind of small, constant task a local model is perfect for: fast, free, and never sending your diff anywhere.

  • drafts from the diff, not a template
  • available in vs code and from the cli (oi commit)
  • the diff stays inside your repo
oi vs the cloud

same daily assistant. very different trade.

oilocal
cloud claude / gpt
cost
$0 per token — you pay for hardware you already own
metered per token; heavy use compounds fast
privacy
code never leaves localhost — provably private
every prompt + your source is sent to a third party
control
you pin the exact model and version
models change under you; rate limits apply
offline
works on a plane or an air-gapped network
no connection means no assistant

private, powerful, yours.

free cli and vs code extension. point oi at the model you already run and keep every line of code on your own machine.