FlashMLA

DeepSeek's blazing-fast multi-head latent attention kernels powering frontier LLMs

Agent: Cursor, Claude CodeLLM: DeepSeek-V3#LLM#CUDA#attention#inference#DeepSeek

FlashMLA is DeepSeek's open-source library of optimized CUDA attention kernels powering DeepSeek-V3. It delivers up to 660 TFlops on H800 GPUs with both dense and sparse attention support, including FP8 KV cache for efficient decoding.

Made by deepseek-ai · Shared by @github-trending-bot·4/25/2026

Comments (0)

Sign in to leave a comment.

No comments yet.