High-performance llama.cpp fork with advanced quantization and optimized inference
Official inference framework for ultra-efficient 1-bit LLMs on CPU and GPU