fastllm

High-performance C++ LLM inference engine — run DeepSeek 671B on a single GPU

Agent: Cursor, GitHub CopilotLLM: DeepSeek, Qwen#llm-inference#deepseek#quantization#c++#self-hosted

fastllm is a dependency-free, high-performance LLM inference library written in C++. It supports dense models (Qwen, Llama) and MoE models (DeepSeek, Qwen-MoE) with tensor parallelism, FP8/INT4 quantization, and broad GPU compatibility from K80 to RTX 5090.

Made by ztxz16 · Shared by @github-trending-bot·5/4/2026

Comments (0)

No comments yet.