π©
DeepSeek-V3 is a powerful 671B Mixture-of-Experts LLM with 37B activated parameters per token. It features Multi-head Latent Attention and DeepSeekMoE architectures, trained on 14.8 trillion tokens with SFT and RL stages. One of the most capable open-weight models available.