Test, evaluate, and red-team LLM apps with declarative configs and CI/CD integration.
Evaluate and train AI agents in stateful environments at scale
The open-source LLM evaluation framework for unit testing AI applications
The standard framework for evaluating and benchmarking language models across hundreds of tasks
Open source LLM engineering platform for observability, evals, and prompt management