Lightweight WebAssembly runtime optimized for running LLMs on edge devices.
High-performance llama.cpp fork with advanced quantization and optimized inference
Official inference framework for ultra-efficient 1-bit LLMs on CPU and GPU
Lightweight C++ inference engine for Google's Gemma LLMs
Run generative AI models locally on Android—no internet required.