Evaluate and train AI agents in stateful environments at scale
Find the best local LLM for your hardware with real benchmarks, not guesses.
The standard framework for evaluating and benchmarking language models across hundreds of tasks
CLI framework for building, testing, and benchmarking AI agent skills across models
Find, benchmark and install 238+ free coding LLM models across 25 providers in real time