Loading...
Loading...
UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.
Preço/1M
$0.13
244th mais barato
60% abaixo da mediana
Top 36%
Janela de Contexto
128K
225th maior
Top 75%
Entrada
$0.10
por 1M tokens
Saída
$0.20
por 1M tokens
Combinado
$0.13
por 1M tokens
Mais barato que 64% dos modelos. Preço mediano é $0.31/1M tokens.
Diário
$0.13
Mensal
$3.75
Janela de Contexto
128K
tokens
Maior que 25% dos modelos
Saída Máxima
2K
tokens
2% do contexto
Comparação de Janela de Contexto