Official inference framework for ultra-efficient 1-bit LLMs on CPU and GPU
Run generative AI models locally on Android—no internet required.
Lightweight C++ inference engine for Google's Gemma LLMs