GPT-5.2 Tested: The Mixed Results and Tough Questions

▼ Summary
– GPT-5.2 is OpenAI’s latest model, but it only slightly outperforms the free GPT-5.1 in text tasks and performs worse in image generation.
– The model shows strong capabilities in writing, analysis, and explanation, earning high scores in most text-based tests.
– It exhibits a significant regression in coding ability, failing a basic programming test that its predecessor passed.
– A new behavior requiring a user “go signal” for longer responses and a tendency toward extreme brevity may frustrate professional users.
– Overall, the review finds GPT-5.2 to be an incremental update that does not represent substantial progress over the previous version.
OpenAI’s latest ChatGPT model, GPT-5.2, is now available to Plus subscribers, billed as the premier tool for professional knowledge work. Initial testing reveals a capable but inconsistent upgrade, with strong performance in writing and analysis contrasted by a surprising regression in coding tasks. The model introduces new behaviors, like frequently asking for user confirmation before delivering longer answers, which may disrupt workflow efficiency for power users.
A recent evaluation put the model through a standardized suite of ten text-based challenges and four image-generation tests. The text assessments covered areas like summarization, academic explanation, mathematical reasoning, and creative writing. GPT-5.2 achieved a solid score of 92 out of 100 on these core text tasks, demonstrating proficiency. For instance, it excelled at explaining complex concepts simply, performing pattern recognition, and crafting detailed literary analyses and creative stories.
However, notable quirks emerged. The model displayed a pronounced tendency toward extremely brief answers, sometimes providing one-sentence responses where more detail was expected. More disruptively, it often paused to request a “go signal” before generating multi-paragraph explanations, even for prompts that were not exceptionally long. This new layer of confirmation could become tedious in professional settings.
The most significant finding was in the coding evaluation. Despite being marketed for technical work, GPT-5.2 failed a basic regular expression validation test that its predecessor, GPT-5.1, had passed. The generated code contained critical flaws, including inadequate error handling for empty inputs and a complete lack of type checking, which could cause the function to crash. This represents a clear step backward for developers relying on the assistant for programming help.
In image generation, the model scored 17 out of 20 points. It successfully created compelling dieselpunk scenes and imaginative cultural mashups but struggled with specific, precise technical details, such as correctly orienting turbine fans on a fictional aircraft. The overall image quality was artistic but occasionally suffered from minor issues with scale and consistency.
When comparing total scores, GPT-5.2’s performance is nearly identical to the free-tier GPT-5.1, offering only a marginal one-point improvement in text while scoring one point lower in image generation. This raises questions about the value proposition for the required twenty-dollar monthly subscription. The update feels incremental, with its new brevity and confirmation prompts potentially hindering more than helping. For users whose needs center on eloquent writing and general analysis, it performs well, but coders and those seeking verbose, detailed explanations may find the experience frustrating. The model’s occasional response delays further suggest it may still be stabilizing after release.
(Source: ZDNET)





