Review Any Tech
AI News

OpenAI's GPT-5.2 Tops Independent Coding Benchmarks, Closes Gap on Reasoning

The Verge

Third-party evaluators say GPT-5.2 now leads on SWE-bench Verified and narrows the reasoning gap with rival frontier models.

Independent benchmarking groups report that OpenAI's GPT-5.2 has taken the top spot on SWE-bench Verified, a widely used measure of real-world software engineering ability, edging out competing frontier models for the first time since GPT-5 launched.

Evaluators also note meaningful gains on multi-step reasoning tasks, though they caution that the gap with rival labs remains within the margin of normal benchmark volatility. OpenAI has not commented on the results beyond confirming GPT-5.2 is now generally available via the API.

Originally published by The Verge.

Related News

Anthropic Launches Claude Opus 4.8 With a 1 Million Token Context Window

The new flagship model extends Claude's context window eightfold and adds faster tool-calling for agentic workflows, Anthropic says.

· 6h ago

Google Brings Gemini 3 Pro On-Device Reasoning to Pixel and Android Flagships

A system update lets Gemini 3 Pro run multi-step reasoning entirely on-device on the latest Pixel and Snapdragon flagship phones.

· 1d ago

Meta Releases Llama 4 Open-Weight Models With Native Multimodal Support

Meta says the Llama 4 family is free for commercial use under its updated license and matches closed models on several benchmarks.

· 4d ago

Microsoft Expands Copilot Studio With Multi-Agent Orchestration for Enterprises

The update lets enterprise IT teams chain specialized Copilot agents together with shared memory and audit logging.

· 2d ago

Comments

Sign in to join the discussion.