💩
FastFlowLM is a lightweight runtime that lets you run LLMs (including vision, audio, and MoE models) directly on AMD Ryzen AI NPUs — no GPU needed. It's 10× more power-efficient than GPU inference, supports up to 256k context, and installs in under 20 seconds. The Ollama-compatible CLI makes it easy to adopt.