Have open-source LLMs caught up with proprietary models on coding? | FindLLM

Have open-source LLMs caught up with proprietary models on coding?

Comparing open-source models like GLM 5 and Qwen against GPT-5.4 and Claude Opus on coding benchmarks. The gap is smaller, but it's still there.

FindLLMMarch 23, 2026

open-sourcecodingbenchmarksdeepseekqwengptclaudecomparison

No. Open-source LLMs have not caught up with proprietary models on coding benchmarks — but the gap has compressed enough that the cost math now favors open-source for a wide range of real-world coding workloads.

Where the top of the leaderboard stands

The best proprietary models still hold the quality ceiling. GPT-5.4 (OpenAI) and Gemini 3.1 Pro Preview (Google) both score 57.2 on the quality index. GPT-5.3-Codex (OpenAI), purpose-built for code, scores 54.0. Claude Opus 4.6 Adaptive (Anthropic) sits at 53.0.

The highest-scoring open-source model in the current data is GLM 5 (Z AI) at 49.8. That's a 7.4-point deficit against the top proprietary model — roughly 13% lower. Not negligible, but not the chasm it was two years ago.

The price gap tells a different story

Here's where open-source gets interesting. GLM 5 costs $1.11/M tokens. GPT-5.4 costs $5.63/M tokens. That's a 5× price difference for a 13% quality gap.

Model	Quality	Price/1M tokens	Speed	Open source
GPT-5.4	57.2	$5.63	85 tok/s	No
GPT-5.3-Codex	54.0	$4.81	71 tok/s	No
Claude Opus 4.6	53.0	$10.00	51 tok/s	No
GLM 5	49.8	$1.11	89 tok/s	Yes
GPT-5.4 Mini	48.1	$1.69	237 tok/s	No
MiniMax M2.7	49.6	$0.52	43 tok/s	No

GLM 5 also runs at 89 tok/s, faster than GPT-5.4 (85 tok/s) and significantly faster than Claude Opus 4.6 (51 tok/s). For batch code generation or CI pipelines where you're making thousands of calls, the throughput and cost advantage compounds fast.

Quality comparison

What about the community favorites?

The r/LocalLLaMA community is buzzing about Qwen and MiniMax. Alibaba has publicly committed to continuing open-source releases of Qwen models, and posts about uncensored variants and Qwen Coder 30B running at 115 tok/s on older Nvidia V100 hardware are getting serious traction. MiniMax has announced that will go open-weights — and at 49.6 quality for $0.52/M tokens, that's a model scoring within striking distance of GPT-5.2 (51.3) at roughly one-tenth the price.

Stay in the loop

Weekly LLM analysis delivered to your inbox. No spam.

Have open-source LLMs caught up with proprietary models on coding?

Where the top of the leaderboard stands

The price gap tells a different story

What about the community favorites?

Stay in the loop

Where proprietary still wins cleanly

My recommendation