High-performance C++ LLM inference engine — run DeepSeek 671B on a single GPU
A terminal-native coding agent powered by DeepSeek models with 1M-token context and thinking-mode reasoning
671B parameter MoE language model with state-of-the-art performance and efficient inference