The standard framework for evaluating and benchmarking language models across hundreds of tasks
AI agent skill that searches Reddit, X, YouTube, HN & Polymarket to synthesize what's trending now