π©
A performance-focused fork of llama.cpp featuring state-of-the-art quantization types, improved CPU/GPU hybrid inference, Bitnet support, and optimized operations for DeepSeek models. Designed for developers building efficient local LLM applications with better throughput and lower latency.