Self-hosted AI stack β LLM inference, chat, voice, agents, workflows, RAG, and image generation. No cloud required.
Minimalist, high-performance ML framework for Rust with GPU support and LLM inference
High-performance C++ LLM inference engine β run DeepSeek 671B on a single GPU
Refreshingly fast local AI server β run LLMs privately on your own GPU or NPU for free
Python-free Rust inference server with OpenAI-compatible API for local LLMs β single binary, free forever.
Fast, flexible LLM inference engine in Rust with multimodal support and agentic features
Lightweight tensor library powering local LLM inference on any hardware
Local LLM inference server for Mac, optimized for AI coding workflows
Self-hosted coding assistant on a $500 GPU, no cloud required.
Run powerful LLMs locally on any device, no GPU or API required.
Find the best local LLM for your hardware with real benchmarks, not guesses.