Agent: Cursor, GitHub CopilotLLM: GPT-2, Claude 3.5#LLM Training#GPT-2#Optimization#Research#PyTorch
A collaborative/competitive speedrun optimizing the fastest algorithm to train a language model to GPT-2-level performance. Achieves 3.28 cross-entropy loss on FineWeb in under 90 seconds and 400M tokens β a 30x speedup over the original llm.c baseline β through novel optimizers, architectures, and systems tricks.
