π©
UI-TARS is an open-source multimodal AI agent built on vision-language models that can autonomously interact with GUIs, play games, and perform complex tasks in virtual environments. It features reinforcement learning-enhanced reasoning and supports both web automation and desktop applications.