Ollama

Open-source tool for running LLMs locally on macOS, Windows, or Linux and also hosting some models in Ollama Cloud. Used with Claude Code to swap out the underlying model — launched via ollama launch claude — enabling free or near-free Claude Code usage by pointing the agent harness at local Qwen, Gemma, or MiniMax models. Requires a $5 API credit on Anthropic to activate Claude Code but the local models bill nothing.

Key facts

  • Local CLI: ollama pull downloads a model, ollama run chats with it
  • Cloud option via Ollama Cloud (MiniMax, etc.) when local hardware can’t run big models
  • ollama launch claude integration pulls Claude Code pointing at a local or cloud Ollama model
  • Default Ollama context windows can be smaller than advertised — create a custom model with more context for Claude Code
  • Paid tier required for concurrent cloud models or higher usage limits
  • Claude Code still requires $5 of Anthropic credit to initialise, but local models never consume it
  • Nate’s May 2026 stack video places Ollama in the experimental bucket: he does not run on local models day to day, but uses Ollama to download, test, or access open-source models and keep up with the ecosystem [src-053]

Related concepts

Source references

  • [src-004] Nate Herk cluster — Nate Herk — Claude Code cluster (21 videos)

– Videos referenced: O2k_qwZA8HU, sboNwYmH3AY

  • [src-053] Nate Herk — “Overwhelmed By AI? Just Copy My Tech Stack” (2026-05-08)