DeepSeek's blazing-fast multi-head latent attention kernels powering frontier LLMs
High-performance FP8/FP4/BF16 CUDA kernel library powering DeepSeek's large language models