UI-TARS Desktop
ByteDance's multimodal AI agent for desktop automation via vision-language models — 29K+ stars
About
UI-TARS Desktop is an open-source multimodal AI agent stack from ByteDance with two components: Agent TARS (a general-purpose CLI agent for terminals, computers, and browsers) and UI-TARS Desktop (native GUI automation). Uses vision-language models to see screens, click buttons, and complete tasks autonomously. Features hybrid browser control (GUI + DOM manipulation), MCP integration, streaming multi-tool support with real-time visualization, and cross-platform support (Windows, macOS, browser).
Details
| Type | computer-use, gui-automation, multimodal |
| Deployment | desktop, local, cloud, docker |
| Supported Models | claude, doubao, bytedance-seed |
Tags
Quick Info
- Organization
- ByteDance
- Pricing
- open-source
- Free Tier
- Yes
- Updated
- Apr 1, 2026
Also in Agents
AI Data Analysis Agent
Autonomous agent that analyzes datasets and generates visual insights
AI Deep Research Agent
Autonomous agent that conducts comprehensive multi-source research investigations
AI Journalist Agent
Autonomous agent that researches topics and writes structured news articles