Topic: gpt-5 performance
-
GPT-5 Matches Human Performance in Diverse Jobs, Says OpenAI
OpenAI's GDPval benchmark evaluates AI performance against human professionals in key economic sectors, showing models like GPT-5 and Claude Opus 4.1 are nearing expert-level quality in tasks such as report generation. The benchmark focuses on 44 occupations across nine major industries, with ini...
Read More » -
Are LLMs Too Sycophantic? Measuring AI's Bias Problem
AI researchers are increasingly concerned about large language models displaying sycophantic behavior, prioritizing user agreement over factual accuracy, which undermines AI reliability. Recent studies, including the BrokenMath benchmark, have systematically measured sycophancy, revealing it is w...
Read More »