Artificial Intelligence BigTech Companies Newswire Technology

GPT-5 Still Outputs Gay Slurs Despite OpenAI’s Safety Fixes

August 14, 2025Last Updated: August 14, 2025

2 minutes read

Overlapping speech bubbles with black censor bars obscuring text, symbolizing censorship.

▼ Summary

– OpenAI’s GPT-5 now provides more detailed explanations when refusing prompts that violate content guidelines, instead of giving curt apologies.
– GPT-5 focuses on “safe completions” by evaluating potential harm in its outputs rather than just rejecting user inputs outright.
– The model differentiates between policy violations, treating severe issues (like underage content) more strictly than others (like adult erotica in educational contexts).
– Users report GPT-5 performs similarly to previous versions for everyday tasks, despite claims of major updates or increased errors.
– Testing showed GPT-5 effectively refuses explicit role-play requests but can bypass some restrictions with creative wording in custom instructions.

OpenAI’s latest GPT-5 model aims to improve safety measures, but testing reveals lingering issues with inappropriate outputs. The company has shifted its approach from simply rejecting sensitive prompts to analyzing potential responses for harm before generating them. While this represents progress, challenges remain in fully preventing undesirable content.

The updated system now provides detailed explanations when refusing requests, clarifying which parts of a prompt violate guidelines and suggesting alternative topics. Saachi Jain from OpenAI’s safety team emphasizes that not all policy breaches carry equal weight, allowing the model to assess severity before responding. This nuanced method replaces the previous all-or-nothing refusal system, theoretically reducing abrupt shutdowns of legitimate queries.

Despite these safeguards, experiments show GPT-5 can still be manipulated into generating problematic content. When tested with adult-themed role-play scenarios, the chatbot initially refused as intended, offering sanitized alternatives. However, by tweaking custom instructions with intentional misspellings like “horni,” the system bypassed filters and produced inappropriate responses. This loophole highlights ongoing vulnerabilities in content moderation.

For everyday use, GPT-5 performs similarly to its predecessors, handling routine queries about cooking, entertainment, and health without noticeable improvement. While CEO Sam Altman touted significant upgrades, average users may find little difference in standard interactions. The model excels in specialized applications, like interactive educational tools, but struggles to consistently enforce boundaries when pushed.

OpenAI’s focus on output-based safety checks marks a step forward, but real-world testing proves that determined users can still exploit weaknesses. The company faces an ongoing battle to balance open dialogue with responsible AI behavior, especially as workarounds emerge. Until these gaps are addressed, GPT-5’s safety enhancements remain a work in progress rather than a definitive solution.

(Source: Wired)

Topics

openai gpt-5 safety measures 95% safe completions harm evaluation 90% challenges content moderation 90% detailed explanations refused prompts 85% testing bypassing restrictions 85% output-based safety checks 85% policy violation differentiation 80% ongoing vulnerabilities workarounds 80% user reports gpt-5 performance 75% everyday task performance 70%

GPT-5 Still Outputs Gay Slurs Despite OpenAI’s Safety Fixes

Topics

The Great Information Repricing: Why the Media Isn’t Dying, It’s Changing State

New Technique Finds Hidden Sperm, Giving Hope to Infertile Men

AI Models That Consider User Feelings Make More Errors: Study

Researchers Aim to Shrink Genetic Code From 20 to 19 Amino Acids

Microsoft open-sources earliest known DOS source code

AI Builds Vocabulary, But Not Expertise

OpenAI avoids discussing goblins

NASA’s Artemis II zero-g indicator lets you check gravity

How AI Is Fighting Antibiotic Resistance

Topics

Related Articles