High-performance C++ LLM inference engine — run DeepSeek 671B on a single GPU
Tencent's toolkit for compressing LLMs and VLMs with quantization and speculative decoding
Fast, flexible LLM inference engine in Rust with multimodal support and agentic features
Lightweight tensor library powering local LLM inference on any hardware
High-performance llama.cpp fork with advanced quantization and optimized inference