About

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and GPT-4.5 across coding (54.6% SWE-bench Verified), instruction compliance (87.4% IFEval), and multimodal understanding benchmarks. It is tuned for precise code diffs, agent reliability, and high recall in large document contexts, making it ideal for agents, IDE tooling, and enterprise knowledge retrieval.

Model Family

GPT-5.4 mini (xhigh)2026-03-17 GPT-5.4 nano (xhigh)2026-03-17 GPT-5.4 nano (medium)2026-03-17 GPT-5.4 mini (medium)2026-03-17 GPT-5.4 nano (Non-Reasoning)2026-03-17 GPT-5.4 mini (Non-Reasoning)2026-03-17 GPT-5.4 (xhigh)2026-03-05 GPT-5.4 (Non-reasoning)2026-03-05

Benchmarks

MMLU-Pro

80.6%

GPQA Diamond

66.6%

HLE

4.6%

LiveCodeBench

45.7%

SciCode

38.1%

TerminalBench Hard

13.6%

MATH-500

91.3%

AIME

43.7%

AIME 2025

34.7%

IFBench

43.0%

Long Context Recall

61.0%

Tau2

47.1%

Market AverageTop Score

GPT-4.1

About

Model Family

Market Position

Pricing

Cost Calculator

vs. Similar Models

Performance

Benchmarks

Quick Compare

Similar Models