DeepYardDeepYard
U

UI-TARS Desktop

ByteDance's multimodal AI agent for desktop automation via vision-language models — 29K+ stars

Open SourceFree

About

UI-TARS Desktop is an open-source multimodal AI agent stack from ByteDance with two components: Agent TARS (a general-purpose CLI agent for terminals, computers, and browsers) and UI-TARS Desktop (native GUI automation). Uses vision-language models to see screens, click buttons, and complete tasks autonomously. Features hybrid browser control (GUI + DOM manipulation), MCP integration, streaming multi-tool support with real-time visualization, and cross-platform support (Windows, macOS, browser).

Details

Typecomputer-use, gui-automation, multimodal
Deploymentdesktop, local, cloud, docker
Supported Modelsclaude, doubao, bytedance-seed

Tags

computer-usemultimodaldesktop-automationvisionmcptypescriptbytedance