Open-source multimodal AI agent stack connecting cutting-edge models with GUI automation and MCP tools