GPT-5: Developers Weigh the Pros and Cons

▼ Summary
– OpenAI positioned GPT-5 as a “true coding collaborator” designed for high-quality code generation and automated software tasks, seemingly targeting Anthropic’s Claude Code.
– Developers report mixed results with GPT-5, praising its technical reasoning but noting Claude’s Opus and Sonnet models often produce superior code, with GPT-5 sometimes generating redundant lines.
– Critics argue OpenAI’s benchmarks for GPT-5’s coding performance are misleading, with one research firm calling a published graphic a “chart crime.”
– GPT-5 stands out for its cost-effectiveness, with tests showing it is significantly cheaper to run than Anthropic’s Opus 4.1, though it underperforms in accuracy (27% vs. Claude’s 51%).
– OpenAI claims GPT-5 was trained on real-world coding tasks and highlights internal accuracy metrics, while Anthropic emphasizes the importance of price per outcome over price per token.
The release of GPT-5 has sparked intense debate among developers, with opinions divided on whether OpenAI’s latest model delivers on its promise as a coding collaborator. While the company positions it as a breakthrough for automated software tasks, early adopters report a blend of strengths and shortcomings compared to rival AI tools like Anthropic’s Claude.
Technical reasoning stands out as GPT-5’s strongest asset, particularly in planning and structuring code. However, some developers argue that Claude’s Opus and Sonnet models still produce cleaner, more efficient outputs. The variability in GPT-5’s verbosity settings, low, medium, or high, also influences its performance. At higher verbosity levels, the model tends to over-explain, generating excess code that requires manual cleanup.
Criticism has also surfaced around OpenAI’s benchmarking methods. Independent researchers question the validity of the company’s performance claims, with one firm dismissing a key promotional graphic as misleading. Despite these concerns, GPT-5’s affordability has emerged as a clear advantage. Tests by Princeton researcher Sayash Kapoor reveal that running benchmarks with GPT-5 costs a fraction of what competitors charge. For example, reproducing results from 45 scientific papers cost just $30 using GPT-5 (medium verbosity), compared to $400 for Claude Opus 4.1.
Yet lower costs come with trade-offs. Kapoor’s preliminary findings show GPT-5 trailing Claude in accuracy, scoring 27% versus Claude’s 51% in replicating research paper outcomes. OpenAI counters these claims by emphasizing GPT-5’s performance in real-world coding scenarios, noting that its “thinking” variant outperforms earlier models in deliberate reasoning tasks.
Anthropic, meanwhile, cautions against focusing solely on token-based pricing, arguing that efficiency per outcome matters more than raw cost savings. As developers continue testing these tools in production environments, the true value of GPT-5, and its competitors, will become clearer. For now, the choice between them hinges on whether cost-effectiveness or precision takes priority in a given project.
(Source: Wired)





