High-performance C++ LLM inference engine — run DeepSeek 671B on a single GPU
Lightweight C++ inference engine for diffusion models