Loading...
Loading...
GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.
Quality Index
17.1
229th of 442
Top 52%
Coding Index
11.1
246th of 352
Top 70%
Math Index
26.3
193rd of 268
Top 72%
Price/1M
$0.45
377th cheapest
45% above median
Top 56%
Speed
22 tok/s
Top 61%
TTFT
6.09s
Context Window
131K
145th largest
Top 63%
Input
$0.30
per 1M tokens
Output
$0.90
per 1M tokens
Blended
$0.45
per 1M tokens
Cheaper than 44% of models. Median price is $0.31/1M tokens.
Daily
$0.45
Monthly
$13.50
22
tokens/sec
Faster than 39% of models
6.09
seconds
Faster than 8% of models
6.09
seconds
Faster than 24% of models
Market Median
46 tok/s
52% slower
Median TTFT
0.42s
1357% slower
Throughput/Dollar
49
tok/s per $/1M
Speed Comparison
Context Window
131K
tokens
Larger than 37% of models
Max Output
131K
tokens
100% of context