Have open-source LLMs caught up with proprietary models on coding?
Comparing open-source models like GLM 5 and Qwen against GPT-5.4 and Claude Opus on coding benchmarks. The gap is smaller, but it's still there.
No. Open-source LLMs have not caught up with proprietary models on coding benchmarks — but the gap has compressed enough that the cost math now favors open-source for a wide range of real-world coding workloads.
Where the top of the leaderboard stands
The best proprietary models still hold the quality ceiling. GPT-5.4 (OpenAI) and Gemini 3.1 Pro Preview (Google) both score 57.2 on the quality index. GPT-5.3-Codex (OpenAI), purpose-built for code, scores 54.0. Claude Opus 4.6 Adaptive (Anthropic) sits at 53.0.
The highest-scoring open-source model in the current data is GLM 5 (Z AI) at 49.8. That's a 7.4-point deficit against the top proprietary model — roughly 13% lower. Not negligible, but not the chasm it was two years ago.
The price gap tells a different story
Here's where open-source gets interesting. GLM 5 costs $1.11/M tokens. GPT-5.4 costs $5.63/M tokens. That's a 5× price difference for a 13% quality gap.
| Model | Quality | Price/1M tokens | Speed | Open source |
|---|---|---|---|---|
| GPT-5.4 | 57.2 | $5.63 | 85 tok/s | No |
| GPT-5.3-Codex | 54.0 | $4.81 | 71 tok/s | No |
| Claude Opus 4.6 | 53.0 | $10.00 | 51 tok/s | No |
| GLM 5 | 49.8 | $1.11 | 89 tok/s | Yes |
| GPT-5.4 Mini | 48.1 | $1.69 | 237 tok/s | No |
| MiniMax M2.7 | 49.6 | $0.52 | 43 tok/s | No |
GLM 5 also runs at 89 tok/s, faster than GPT-5.4 (85 tok/s) and significantly faster than Claude Opus 4.6 (51 tok/s). For batch code generation or CI pipelines where you're making thousands of calls, the throughput and cost advantage compounds fast.
What about the community favorites?
The r/LocalLLaMA community is buzzing about Qwen and MiniMax. Alibaba has publicly committed to continuing open-source releases of Qwen models, and posts about uncensored variants and Qwen Coder 30B running at 115 tok/s on older Nvidia V100 hardware are getting serious traction. MiniMax has announced that will go open-weights — and at 49.6 quality for $0.52/M tokens, that's a model scoring within striking distance of GPT-5.2 (51.3) at roughly one-tenth the price.
Stay in the loop
Weekly LLM analysis delivered to your inbox. No spam.