π©
oMLX is an LLM inference server built for Apple Silicon Macs with continuous batching and tiered KV caching. It's designed to support AI coding tools like Claude Code by keeping models in memory and managing context efficiently through a macOS menu bar interface.