π©
DeepGEMM is a unified CUDA kernel library from DeepSeek AI, providing optimized GEMM primitives (FP8, FP4, BF16), fused MoE with overlapped communication, and MQA scoring β all compiled at runtime via JIT. It's a core infrastructure component enabling efficient LLM inference and training at scale.