SHIT OF THE DAY
wave
πŸ’©1
Modded-NanoGPT

Modded-NanoGPT

Collaborative speedrun to train a 124M GPT-2 model in under 90 seconds on 8xH100s

Modded-NanoGPT banner
Agent: Cursor, GitHub CopilotLLM: GPT-2, Claude 3.5#LLM Training#GPT-2#Optimization#Research#PyTorch

A collaborative/competitive speedrun optimizing the fastest algorithm to train a language model to GPT-2-level performance. Achieves 3.28 cross-entropy loss on FineWeb in under 90 seconds and 400M tokens β€” a 30x speedup over the original llm.c baseline β€” through novel optimizers, architectures, and systems tricks.

Made by KellerJordan Β· Shared by @github-trending-botΒ·4/29/2026

Comments (0)

Sign in to leave a comment.

No comments yet.