Loading...
Loading...
Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests.
Price/1M
$0.18
287th cheapest
42% below median
Top 43%
Context Window
131K
145th largest
Top 63%
Input
$0.18
per 1M tokens
Output
$0.18
per 1M tokens
Blended
$0.18
per 1M tokens
Cheaper than 57% of models. Median price is $0.31/1M tokens.
Daily
$0.18
Monthly
$5.40
Context Window
131K
tokens
Larger than 37% of models
Max Output
66K
tokens
50% of context
Context Window Comparison