Self-hosted AI coding stacks: are we there yet?
Open weights, open agents, your own GPUs. We benchmarked a fully self-hosted setup against the frontier hosted tools.
It is finally possible to run a serious AI coding setup entirely on your own hardware. We built one — Code Llama-class weights, an open-source agent, local vector store — and compared it to a frontier hosted tool on the same tickets.
The stack
- Two A100s in a colocated rack.
- An open-weights coding model in the 70B class.
- An open-source agent framework wired into a local file watcher.
- A vector index over the project repo, refreshed hourly.
Where it kept up
Single-file edits, autocomplete, test generation, and small refactors all landed within striking distance of the hosted frontier tools. For routine work the self-hosted stack was perfectly usable.
Latency was actually better than a hosted call for short prompts — no internet round-trip, no shared queue.
Where it lost
Long-horizon planning, multi-file reasoning, and any task that required deep understanding of unfamiliar code all favored the hosted frontier tool. The gap was not subtle — roughly twice the success rate on our hardest tickets.
Tool-use was also rougher. The open-source agent worked, but it required more babysitting and gave up earlier than the hosted equivalents.
The economics
At our usage levels, the self-hosted setup would pay back the hardware in roughly 14 months versus equivalent hosted API spend, before factoring in electricity and engineering time. With that engineering time included, the break-even pushed past 24 months.
Self-hosting is no longer a financial slam dunk. The reason to do it is sovereignty, not savings.
Who should bother
Self-host if you have a regulatory reason to keep code off third-party servers, if you need air-gapped operation, or if you are doing research that requires modifying the model itself. Otherwise the hosted tools are still the right default in 2026.