AI & TechArtificial IntelligenceNewswireReviewsTechnology

GPT-5.2 Tested: 14 Rounds Reveal Serious AI Questions

Originally published on: December 16, 2025
▼ Summary

– GPT-5.2, OpenAI’s latest model for professional work, shows only marginal improvement over GPT-5.1, despite requiring a paid ChatGPT Plus subscription.
– The model demonstrated strong performance in writing, analysis, and creative tasks, excelling in areas like literary analysis and emotional support.
– However, it exhibited a significant regression in coding ability, failing a basic programming test that the previous model had passed.
– A new “go signal” behavior, where the model requests confirmation before giving longer answers, and a tendency toward excessive brevity may frustrate users.
– Overall, the model scored 92/100 on text-based tests and 17/20 on image generation, representing a very incremental update over its predecessor.

OpenAI’s latest ChatGPT model, GPT-5.2, is being promoted as the premier tool for professional knowledge work. However, a series of structured tests reveals a more nuanced picture, showing only marginal improvements over its predecessor and introducing new behaviors that could frustrate users. While it excels in areas like writing and analysis, a surprising regression in coding performance and an insistence on brevity raise questions about its value, especially for subscribers paying for the ChatGPT Plus tier.

A comprehensive evaluation was conducted, applying a consistent set of ten text-based challenges and four image-generation tasks. The text tests covered a wide range of capabilities, from summarizing current events to creative writing, with each task scored out of ten points. The image tests, scored out of five points each, assessed the AI’s ability to follow specific and culturally nuanced prompts.

The first test involved summarizing a recent news story about flooding in Washington State. The AI successfully captured the key details but sourced information from more outlets than instructed, resulting in a minor point deduction. It then perfectly explained the academic concept of educational constructivism to a five-year-old, demonstrating an excellent ability to tailor complex information for a young audience.

Mathematical reasoning was next, using a hidden Fibonacci sequence. GPT-5.2 identified the pattern and completed the calculations instantly and flawlessly. A cultural discussion prompt about streaming services versus movie theaters yielded a brief but accurate response, though it was delivered after a noticeable delay. Literary analysis of A Game of Thrones was comprehensive, but only after the model requested a “go signal” to proceed with a longer answer, a new and potentially intrusive behavior.

Planning a technology and history-focused trip to Boston was mostly successful, though the itinerary lacked recommendations for dining and any discussion of budget. In an emotional support scenario, the AI provided concise yet effective advice for a job interview. A translation task from English to Latin was handled well, though again preceded by a request for confirmation to give a multi-sentence explanation.

A significant stumble occurred in the coding test. The model was asked to write a function for validating dollar and cent entries. Despite coding being a stated strength, GPT-5.2 produced faulty code with two critical errors: it failed to properly handle empty inputs and lacked essential data type checking. This was a stark contrast to the performance of the previous model. The test suite concluded with a creative writing challenge, which the AI passed with flying colors, generating an engaging story over 3,000 words long.

The image generation portion presented mixed results. A prompt for a Marvel-style helicarrier was mostly interpreted correctly, but the AI struggled to orient the vehicle’s turbofans vertically as specified. A dieselpunk scene of a giant robot in a city was rendered perfectly. A creative take on “A Yankee in King Arthur’s Court” earned full points for its consistent, painterly style. The final image, based on Back to the Future, captured the iconic elements but suffered from slight scaling issues with a character.

In total, GPT-5.2 scored 92 out of 100 on text tasks and 17 out of 20 on image generation. This represents a one-point gain in text and a one-point loss in images compared to tests run on the free-tier GPT-5.1 model. The overall impression is that this update feels incremental rather than revolutionary.

The model’s new tendency toward extremely concise answers and its frequent interruptions to ask for a “go signal” before longer responses may hinder workflow efficiency for professionals. Occasional response delays were also noted. While its analytical and writing prowess remains strong, the disappointing coding regression and new conversational quirks make it difficult to justify as a major leap forward. For existing Plus subscribers, it’s a capable but familiar tool; for those considering the upgrade, the decision may require weighing its specific strengths against its unexpected shortcomings.

(Source: ZDNET)

Topics

gpt-5.2 evaluation 100% ai model comparison 95% chatgpt plus 90% coding performance 85% ai testing methodology 85% response brevity 80% go signal 80% Image Generation 75% text analysis 75% professional knowledge work 70%