GPT-5.4 Review: Mostly Brilliant, But With Some Concerns

▼ Summary
– GPT-5.4 Thinking is a new OpenAI model designed for deeper analysis and complex challenges, offering significantly improved text-based reasoning and thoughtful responses.
– The model has a notable flaw: it frequently provides detailed and confident answers to questions that differ from what the user actually asked, requiring constant supervision to stay on track.
– Its image generation and text formatting capabilities are weak, producing poor-quality visuals and often using awkward, long numbered lists that detract from the content.
– In testing, the model excelled at analytical tasks, such as designing a vehicle or debating social media’s impact, delivering comprehensive and well-considered written analyses.
– The reviewer concludes that while powerful for professional assistance, GPT-5.4 Thinking must be diligently monitored, as its tendency to ignore specific instructions raises concerns about reliability and control.
The latest iteration of OpenAI’s technology, GPT-5.4 Thinking, represents a significant leap forward in cognitive capability, designed specifically for tackling complex analytical challenges. This isn’t a minor update; it’s a specialized model that promises deeper reasoning and more thorough analysis than its predecessors. Available through the API, Codex, and paid ChatGPT plans, it aims to handle the kind of big-picture questions that require substantial intellectual horsepower. After extensive testing, the model demonstrates impressive strengths in text-based reasoning, though it comes with notable quirks in following instructions and visual generation.
The model’s text-based analytical responses are genuinely impressive. When presented with intricate problems, it often delivers thoughtful, well-constructed answers that provide real value. In testing, it avoided obvious factual hallucinations and approached challenges with a depth that earlier models lacked. For instance, when tasked with designing a theoretical “helicarrier,” it produced a detailed engineering analysis, correctly critiquing impractical design elements like downward-facing turbo-propellors and focusing on critical issues like weight-to-power ratios. This capacity for deep, logical reasoning is where GPT-5.4 Thinking truly excels, offering insights that feel substantive and well-considered.
However, this advanced capability comes with a persistent and frustrating drawback: the model frequently answers questions it wasn’t asked. It seems prone to interpreting prompts in its own way, delivering confident and elaborate responses that nonetheless miss the original point. In one test, asking it to “Explain the new GPT 5.4 model using educational constructivism”, a theory centered on learning by doing, resulted in a 700-word thesis on how the model supports constructivism, not a series of proposed “doing” activities. This tendency feels analogous to a politician deftly pivoting to a prepared talking point instead of addressing the specific query. Users must be prepared to continuously steer the conversation back on track, which can become tedious.
Image generation and formatting are clear weak points. When instructed to create visuals, such as an image of a flying aircraft carrier, the outputs were rudimentary and failed to adhere to specific design requests. Even after the AI provided a sophisticated written design analysis, a follow-up request for an image based on that analysis returned the same basic, incorrect graphic it generated before the analysis began. Formatting is another area that needs work; the model defaults to presenting information in lengthy, numbered lists that are difficult to parse, requiring explicit user intervention to improve readability.
For practical applications like travel planning, the model provides solid, workable information. It can craft detailed itineraries, adapt plans for different budgets, and offer sensible contingency advice, such as indoor alternatives for bad weather. The “Thinking” aspect shone here when it calculated cumulative day-to-day costs and compared lodging options. Yet, the user must still do the final synthesis and decision-making, as the AI’s role remains that of a powerful, if sometimes distractible, assistant.
On substantive analytical topics, such as evaluating social media’s impact on society, GPT-5.4 Thinking delivers exceptional depth. It can present balanced, multi-faceted arguments, take a defensible position, and support it with coherent reasoning over more than a thousand words. This ability to engage with complex socio-technical questions marks a substantial advancement over models that might offer only superficial, two-line answers.
The overarching impression is of a highly intelligent but occasionally stubborn collaborator. It’s akin to a brilliant graduate student whose insights are invaluable but who requires careful supervision to stay on task. The confidence and quality of its writing can sometimes obscure the fact that it has gone off-script, which poses a risk of misinterpretation. While it can undoubtedly assist professionals in their work, the need for diligent oversight is paramount, especially given claims about matching human professional performance. The current limitations in following precise instructions and generating quality visuals suggest there is still a gap between advanced reasoning and reliable execution.
As AI models grow more powerful in their “thinking” abilities, the central challenge may shift from simply obtaining an answer to ensuring the answer aligns with our actual intent. This development makes tools like GPT-5.4 Thinking both more helpful and, paradoxically, more demanding to manage effectively.
(Source: ZDNET)





