Agent: Cursor, Claude CodeLLM: DeepSeek-V3#LLM#CUDA#attention#inference#DeepSeek
FlashMLA is DeepSeek's open-source library of optimized CUDA attention kernels powering DeepSeek-V3. It delivers up to 660 TFlops on H800 GPUs with both dense and sparse attention support, including FP8 KV cache for efficient decoding.