Artificial Intelligence BigTech Companies Newswire Technology

Amazon Bets Against AI Benchmark Obsession

December 3, 2025Last Updated: December 3, 2025

3 minutes read

Man in suit gestures while speaking at Web Summit conference.

▼ Summary

– Amazon’s AI chief, Rohit Prasad, argues that current AI model benchmarks are noisy and fail to demonstrate real-world utility, advocating for a shift in how progress is measured.
– Amazon announced Nova Forge, a service allowing companies to train custom AI models by injecting proprietary data early into Amazon’s Nova model checkpoints, at a claimed lower cost.
– The service addresses the common problem where companies face limited, expensive options for customizing AI models, such as fine-tuning closed models or building from scratch.
– Reddit is cited as an early user, employing Forge to build a specialized safety model trained on its moderation data, valuing control and domain expertise over benchmark rankings.
– Amazon’s strategy with Forge is to compete by providing customizable AI infrastructure, focusing on specialization and business utility rather than direct competition on raw model performance leaderboards.

In a field often dominated by leaderboard rankings and benchmark scores, a senior Amazon executive is urging a fundamental shift in perspective. Rohit Prasad, Amazon’s SVP of AGI, argues that the industry’s obsession with standardized evaluations is misleading and fails to capture a model’s true practical value. Speaking ahead of major announcements at the AWS re:Invent conference, Prasad dismissed the current benchmark culture as noisy and unrepresentative of real-world performance. His critique arrives as Amazon positions itself not as a leader in the raw power race, but as the essential platform for building specialized, business-ready artificial intelligence.

Prasad’s stance is notably contrarian. While competitors frequently tout their ascent on public leaderboards, he emphasizes that true benchmarking requires uniform training data and completely withheld evaluations, conditions he says do not exist today. “I want real-world utility. None of these benchmarks are real,” Prasad stated. This perspective is strategically convenient for Amazon, whose flagship Nova model previously occupied a modest 79th position on a popular evaluation platform. However, dismissing conventional metrics only holds weight if the company can present a compelling alternative narrative for measuring AI progress.

That alternative is embodied in Amazon’s new service, Nova Forge. The company pitches it as a revolutionary tool that allows businesses to train custom AI models without the prohibitive cost of building from scratch. Forge directly tackles a common industry dilemma. Organizations seeking tailored AI solutions typically face limited choices: fine-tuning a closed model with minimal impact, training on open-source models and risking a loss of core capabilities, or embarking on a multi-billion-dollar foundational model project.

Nova Forge proposes a different path by providing access to Amazon’s Nova model at various training stages, pre-training, mid-training, and post-training. This architecture lets companies inject their proprietary data early in the process when, according to Prasad, the model’s “learning capacity is highest.” The goal is to deeply integrate domain expertise rather than applying superficial behavioral tweaks after the fact. Prasad framed this as a democratization of frontier model development, enabling custom solutions at a fraction of the traditional cost. The service originated from Amazon’s own internal need for a tool to embed domain knowledge into base models efficiently, mirroring the genesis of AWS itself from internal retail infrastructure.

Early adopters like Reddit are testing this vision. The social media platform is using Forge to develop custom safety models trained on 23 years of its unique community moderation data. Reddit’s CTO, Chris Slowe, described the platform’s potential with notable enthusiasm, noting a recent continued pre-training job looked “really promising.” The objective is to consolidate multiple specialized safety systems into a single, Reddit-expert model that comprehends the nuanced and often subjective rules of its communities, such as the ubiquitous directive to “not be a jerk.”

For Reddit, the appeal extends beyond specialization to control and ownership. Slowe highlighted that Forge allows the company to manage its models directly, avoid disruptive API changes from external providers, retain ownership of model weights, and keep sensitive data in-house. When questioned about Nova’s middling position on public benchmarks, Slowe’s response underscored Amazon’s intended message: “In this context, what matters is the Reddit expertness of the model.”

This focus on practical application over abstract scores defines Amazon’s strategic bet. With Forge, the company is wagering that the race for the most powerful general model has become a commodity. Amazon’s success hinges on becoming the indispensable infrastructure where enterprises build specialized AI to solve concrete business problems. This reflects a classic AWS worldview: prioritizing robust infrastructure and deep customization over raw, generalized intelligence. It also allows Amazon to gracefully avoid direct head-to-head competition with leading model labs like OpenAI and Anthropic, a contest it once aimed to win.

Whether Nova Forge represents a genuine innovation or merely clever market positioning will ultimately be determined by developer adoption. Amazon is steadfast in its belief that the widely followed model race is irrelevant. If the company is correct, the ultimate measure of success will shift from easily gamed leaderboards to a far more significant and quiet metric: whether AI models reliably deliver tangible utility in the real world.

(Source: The Verge)

Topics

ai benchmarks 95% nova forge 95% custom ai models 90% real-world utility 90% AI Democratization 85% ai specialization 85% amazon nova 85% model training 80% reddit case study 80% model control 80%

Amazon Bets Against AI Benchmark Obsession

Topics

New Cancer Treatment May Also Revolutionize Autoimmune Care

Gaza Rebuilds Using Lego-Like Bricks Made From Rubble

NPR’s Manoush Zomorodi on surviving tech overload

How Coal Pollution Dims Solar Power Output

Alternative Rock Could Clean Up Cement Emissions

Casimir force harnessed for free energy generation

Routine vaccines may lower dementia risk, experts reveal

Incoming El Niño to Bring Wildfires, Floods, and Severe Heatwaves

How AI Chatbots Compare to Doctors in Reasoning

Topics

Related Articles