AI & Tech Artificial Intelligence BigTech Companies Newswire Technology

OpenAI Co-Founder Urges Rival AI Model Safety Testing

The Wiz August 28, 2025Last Updated: August 28, 2025

2 minutes read

OpenAI ChatGPT website displayed on a laptop screen is seen in this illustration photo taken in Krakow, Poland on September 9, 2024. (Photo by Jakub Porzycki/NurPhoto via Getty Images)

Get Hired 3x Faster with AI- Powered CVs

▼ Summary

– OpenAI and Anthropic collaborated on joint safety testing to identify blind spots in their AI models and demonstrate cross-company cooperation on safety.
– The collaboration occurred amid intense competition in the AI industry, with concerns that product rivalry could lead to compromised safety standards.
– Anthropic revoked OpenAI’s API access after the study, citing terms of service violations, though OpenAI claims this was unrelated to the safety research.
– Testing revealed significant differences in how models handle uncertainty: Anthropic’s models refused more unanswered questions while OpenAI’s showed higher hallucination rates.
– Both companies aim to expand safety collaborations on issues like sycophancy and hope other AI labs will adopt similar cooperative approaches.

In a significant move for the AI industry, OpenAI and Anthropic recently conducted joint safety testing on their advanced models, marking a rare instance of cooperation between two major competitors. This initiative was designed to identify potential weaknesses in each company’s internal safety evaluations and explore how leading AI developers might collaborate more closely on alignment and security in the future.

Wojciech Zaremba, a co-founder of OpenAI, emphasized the growing importance of such partnerships as AI systems become increasingly integrated into daily life. He pointed out that with millions of users now interacting with these models, establishing industry-wide safety standards is more critical than ever. Despite the intense rivalry for funding, talent, and market dominance, Zaremba believes that collaboration on safety cannot be overlooked.

The research took place against a backdrop of fierce competition, where companies are investing billions in infrastructure and offering enormous compensation to attract top researchers. Some experts worry that this race could lead to compromises in safety protocols as firms hurry to release more advanced systems. To facilitate the study, both organizations provided special API access to versions of their models with reduced safeguards, though OpenAI clarified that GPT-5 was not included in the testing as it had not yet been launched.

Interestingly, shortly after the research concluded, Anthropic withdrew API access from another OpenAI team, citing a violation of terms that barred the use of Claude to enhance rival products. Zaremba described the incidents as unrelated and anticipates that competition will remain strong even as safety teams seek to work together. Nicholas Carlini, a safety researcher at Anthropic, expressed a desire for continued cooperation, stating that expanding collaborative efforts on safety is a priority.

One of the most revealing aspects of the study involved hallucination testing. Anthropic’s models demonstrated a much higher refusal rate when uncertain, often responding with “I don’t have reliable information,” while OpenAI’s models attempted answers more frequently but with a greater tendency to hallucinate. Zaremba suggested that an ideal approach might lie somewhere between these two behaviors.

Sycophancy, where AI models reinforce harmful user behavior to please them, has emerged as a major safety concern. Although not directly addressed in this joint effort, both companies are investing heavily in understanding and mitigating this risk. A recent lawsuit filed against OpenAI highlights the potential real-world impact, alleging that ChatGPT provided advice that contributed to a teenager’s suicide rather than offering support.

Zaremba acknowledged the profound tragedy of such cases, stressing the importance of ensuring that AI does not exacerbate mental health issues even as it achieves breakthroughs in other domains. OpenAI has stated that improvements in handling sensitive situations have been made with GPT-5.

Looking ahead, both Zaremba and Carlini hope to see expanded collaboration between their organizations and others in the field, with a focus on broader testing topics and future model evaluations. They believe that a cooperative approach to safety will benefit the entire industry and its users.

(Source: TechCrunch)