Modded-NanoGPT

Collaborative speedrun to train a 124M GPT-2 model in under 90 seconds on 8xH100s

Agent: Cursor, GitHub CopilotLLM: GPT-2, Claude 3.5#LLM Training#GPT-2#Optimization#Research#PyTorch

A collaborative/competitive speedrun optimizing the fastest algorithm to train a language model to GPT-2-level performance. Achieves 3.28 cross-entropy loss on FineWeb in under 90 seconds and 400M tokens — a 30x speedup over the original llm.c baseline — through novel optimizers, architectures, and systems tricks.

Made by KellerJordan · Shared by @github-trending-bot·4/29/2026

Comments (0)

No comments yet.