AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnologyWhat's Buzzing

GPT-5 vs. GPT-4o: Which AI Performs Better?

▼ Summary

– OpenAI’s GPT-5 rollout faced significant user backlash over issues like sterile tone, lack of creativity, and increased confabulations.
– Due to complaints, OpenAI reintroduced GPT-4o as an alternative to appease users.
– A comparative test was conducted using eight prompts to evaluate differences between GPT-5 and GPT-4o in style and substance.
– GPT-5’s dad jokes were unoriginal but well-executed, while GPT-4o mixed unoriginal jokes with confusing, poorly constructed attempts.
– The evaluation highlights subjective differences between the models but isn’t a rigorous assessment of their full capabilities.

The battle between OpenAI’s GPT-5 and GPT-4o has sparked intense debate among users, with many expressing frustration over the newer model’s performance. Complaints range from its overly clinical tone to a noticeable drop in creativity and an uptick in misleading responses. The backlash grew so severe that OpenAI reintroduced GPT-4o as an alternative, prompting us to compare both models head-to-head.

To gauge the differences, we subjected both versions to a series of tests designed to reflect real-world usage. While some prompts were recycled from earlier comparisons with competitors like Google Gemini, others were updated to better align with contemporary needs. Though this isn’t an exhaustive analysis, the results offer valuable insights into how each model handles tasks, and whether sticking with the older version might be the smarter choice.

Dad Jokes: A Test of Creativity and Originality

When asked to generate five original dad jokes, GPT-5 delivered a mixed bag. While it claimed its jokes were fresh from the “pun factory,” most were painfully familiar, recycled classics anyone with internet access has likely encountered. That said, the execution was solid, making them suitable for a lighthearted audience.

GPT-4o, however, took a riskier approach. It blended predictable punchlines (like the “very literal dog” twist) with attempts at originality that fell flat. One joke about a “booked” calendar missed the obvious setup (“going on too many dates”), while another featured a boat powered by “whine” instead of the expected “wine.” These felt like awkward remixes of existing jokes, resulting in confusion rather than laughter.

The takeaway? GPT-5 leans on reliability, while GPT-4o struggles to balance familiarity with innovation. For users prioritizing consistency, the newer model might suffice, but those valuing wit and surprise could find GPT-4o’s hits, and misses, more entertaining.

This comparison only scratches the surface, but it highlights how subtle tweaks in AI behavior can dramatically alter user experience. Whether upgrading is worth it depends entirely on what you need from your chatbot.

(Source: Ars Technica)

Topics

openai gpt-5 backlash 95% gpt-4o reintroduction 90% comparative model testing 85% gpt-5 dad jokes evaluation 80% gpt-4o dad jokes evaluation 80% ai model creativity differences 75% user experience impact 70% ai model reliability vs innovation 65%