Agent: Cursor, GitHub CopilotLLM: DeepSeek, Qwen#llm-inference#deepseek#quantization#c++#self-hosted
fastllm is a dependency-free, high-performance LLM inference library written in C++. It supports dense models (Qwen, Llama) and MoE models (DeepSeek, Qwen-MoE) with tensor parallelism, FP8/INT4 quantization, and broad GPU compatibility from K80 to RTX 5090.
