For the longest time, I’ve been trying to figure out a way to “survive” in this new AI age without having to fork over a ton of money just to keep up. I’ve tried using local models via Ollama, and while they definitely work to a degree, they’re (unsurprisingly) not as good as the big model providers.

The local models tend to

  • Forget what they’re doing
  • Struggle to break larger tasks into smaller ones
  • Lose focus easily
  • Have weaker coding performance
  • Drift over longer sessions

So to improve the reliability of fully local, smaller models (and to keep all my data local and in my own network), I created Loki.

It’s a local-first, batteries-included command line tool and runtime for building and running LLM workflows locally. It’s model agnostic and supports things like

  • Agents and agent delegation
  • Roles/personas
  • MCP Servers
  • RAG
  • Custom tools
  • Macros
  • Workflow Scripting

A lot of the features it supports are specifically designed to compensate for weaknesses in smaller local models. For example:

  • Auto continuation to keep pushing models to completion instead of stopping halfway through problems
  • Parallel agent delegation so tasks can be split into smaller, focused scopes
  • Workflow-based execution (“If this, do that”) for building more reliable and repeatable automations

It also supports the major cloud providers if you want them (which definitely helped while testing 😄), but my long-term goal is simple:

Get as close as possible to Claude Code-style reliability using fully local models.

I’m always open to feedback, questions, or ideas.

Repo: https://github.com/Dark-Alex-17/loki

  • boonhet@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    I suggest using unsloth studio to get a friendly GUI for not just downloading models and running inference but also finetuning and such. Underneath it just uses llama.cpp which is supported by a lot of apps but it also adds other APIs IIRC. You can run claude code, github codex, mistral vibe off either the llama.cpp API or the unsloth API depending on which agent you’re using and they’ve got tutorials for setting those up. Other tools too.

    That’s not to say it’s the only one or the best one, but I really like the UI, because it’s both simple and advanced (if you look for it, you can set KV cache type, temperature, etc, but you can also run default settings without ever looking at the advanced stuff).