AI & Tech Artificial Intelligence BigTech Companies Newswire Technology What's Buzzing

GPT-5 vs. GPT-4o: Which AI Performs Better?

August 15, 2025Last Updated: August 15, 2025

2 minutes read

Red and blue robots facing off, appearing ready to fight. CGI render. — CGI figures no release required. Fighting robots on different teams concept, red and blue

▼ Summary

– OpenAI’s GPT-5 rollout faced significant user backlash over issues like sterile tone, lack of creativity, and increased confabulations.
– Due to complaints, OpenAI reintroduced GPT-4o as an alternative to appease users.
– A comparative test was conducted using eight prompts to evaluate differences between GPT-5 and GPT-4o in style and substance.
– GPT-5’s dad jokes were unoriginal but well-executed, while GPT-4o mixed unoriginal jokes with confusing, poorly constructed attempts.
– The evaluation highlights subjective differences between the models but isn’t a rigorous assessment of their full capabilities.

The battle between OpenAI’s GPT-5 and GPT-4o has sparked intense debate among users, with many expressing frustration over the newer model’s performance. Complaints range from its overly clinical tone to a noticeable drop in creativity and an uptick in misleading responses. The backlash grew so severe that OpenAI reintroduced GPT-4o as an alternative, prompting us to compare both models head-to-head.

To gauge the differences, we subjected both versions to a series of tests designed to reflect real-world usage. While some prompts were recycled from earlier comparisons with competitors like Google Gemini, others were updated to better align with contemporary needs. Though this isn’t an exhaustive analysis, the results offer valuable insights into how each model handles tasks, and whether sticking with the older version might be the smarter choice.

Dad Jokes: A Test of Creativity and Originality

When asked to generate five original dad jokes, GPT-5 delivered a mixed bag. While it claimed its jokes were fresh from the “pun factory,” most were painfully familiar, recycled classics anyone with internet access has likely encountered. That said, the execution was solid, making them suitable for a lighthearted audience.

GPT-4o, however, took a riskier approach. It blended predictable punchlines (like the “very literal dog” twist) with attempts at originality that fell flat. One joke about a “booked” calendar missed the obvious setup (“going on too many dates”), while another featured a boat powered by “whine” instead of the expected “wine.” These felt like awkward remixes of existing jokes, resulting in confusion rather than laughter.

The takeaway? GPT-5 leans on reliability, while GPT-4o struggles to balance familiarity with innovation. For users prioritizing consistency, the newer model might suffice, but those valuing wit and surprise could find GPT-4o’s hits, and misses, more entertaining.

This comparison only scratches the surface, but it highlights how subtle tweaks in AI behavior can dramatically alter user experience. Whether upgrading is worth it depends entirely on what you need from your chatbot.

(Source: Ars Technica)