UI-TARS-desktop

UI-TARS Desktop is a GUI agent application leveraging the UI-TARS model, enabling computer control through natural language. It integrates browser operations, command lines, and file systems, showcased with demos and SDK.

13,136
689

Here is a summary of the README content:

UI-TARS Desktop is a GUI agent application built upon the UI-TARS vision-language model, enabling computer control through natural language. It leverages visual interpretation of the screen for precise interaction. A technical preview of Agent TARS, a multimodal AI agent integrating browser operations with command lines and file systems, was released on March 18, 2025.

Key features include natural language control, screenshot and visual recognition, precise mouse and keyboard control, cross-platform support (Windows/MacOS), real-time feedback, and fully local processing for privacy. The application allows users to automate tasks such as web browsing and social media posting, as demonstrated in the showcases.

The UI TARS SDK, a cross-platform toolkit for building GUI automation agents, was introduced on February 20, 2025. Quick Start, Deployment, and Contribution guides are available. The project is licensed under Apache License 2.0.

Repository

BY
bytedance

bytedance/UI-TARS-desktop

Created

January 19, 2025

Updated

March 29, 2025

Language

TypeScript

Category

AI