What this page answers
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with
- ByteDance (Doubao) · bytedance/ui-tars-1.5-7b
- text+image->text · China model route
- 128,000 context · $0.10 input
Before connecting
Do not stop at the model name. Before integration, verify base URL, protocol, visible models, parameters, and limits together.
- supports frequency_penalty
- supports logit_bias
- supports max_tokens
- supports presence_penalty
- supports repetition_penalty
Next action
The goal is to catch search demand, then move users into model profiles, provider profiles, and key checking.
- Check whether the model fits the use case
- Then verify key permission and callable models