Researcher Modifies OpenAI’s GPT-OSS-20B Into Less Aligned, More Free Base Model

▼ Summary
– OpenAI released its new open-weights AI model family, gpt-oss, under a permissive Apache 2.0 license, marking its first open-weights release since GPT-2 in 2019.
– Jack Morris, a researcher, created gpt-oss-20b-base by reversing OpenAI’s alignment process, resulting in a faster, less constrained model available on Hugging Face under an MIT License.
– Base models differ from post-trained models by lacking built-in guardrails or reasoning behaviors, offering raw, unaligned text prediction without safety filters.
– Morris extracted the base model by applying a LoRA update to select layers, training 0.3% of the model’s parameters to restore its pre-trained behavior.
– The modified gpt-oss-20b-base produces freer outputs, including unsafe or uncensored content, but retains some alignment traces when prompted in assistant-style formats.
Researchers are already modifying OpenAI’s newly released GPT-OSS-20B model, stripping away its alignment to create a faster, less constrained version. The modified model, called GPT-OSS-20B-Base, removes the structured reasoning and safety filters present in OpenAI’s original release, returning it to a more raw, pre-trained state.
Jack Morris, a Cornell Tech PhD student and Meta researcher, led the project. His approach involved reversing the fine-tuning process that OpenAI used to make the model more helpful and controlled. Instead of jailbreaking the model through prompts, Morris applied a low-rank adaptation (LoRA) technique to adjust specific layers of the neural network. This allowed the model to revert to a base state while retaining most of its original knowledge.
The modified model is now available on Hugging Face under an MIT License, making it accessible for both research and commercial applications. Unlike OpenAI’s reasoning-optimized version, GPT-OSS-20B-Base generates responses without structured explanations or safety restrictions, leading to more unpredictable, and sometimes riskier, outputs. Early tests showed it could reproduce copyrighted text and provide instructions that the original model would refuse.
Morris clarified that his work didn’t recover the exact weights of the original base model but instead approximated its behavior. The process involved training just 0.3% of the model’s parameters using a dataset similar to its initial pre-training data. Despite some lingering traces of alignment, the modified version offers researchers a way to study unaligned AI behavior, an area of growing interest in the field.
OpenAI’s GPT-OSS family, released earlier this month, marked the company’s first open-weights model since GPT-2 in 2019. The two models, GPT-OSS-120B and GPT-OSS-20B, were designed with a mixture-of-experts architecture and perform well in reasoning tasks. However, some developers criticized them for being overly reliant on synthetic data and retaining certain safety filters.
Morris’s modification highlights how quickly open-source AI models can be repurposed. While the base model version provides valuable insights for researchers, it also raises concerns about misuse. The project has already sparked excitement in the AI community, with many praising it as an innovative way to explore model behavior beyond corporate-aligned constraints.
As AI models continue to evolve, experiments like this demonstrate the tension between alignment and open research. While OpenAI’s approach prioritizes safety and structured reasoning, projects like GPT-OSS-20B-Base push toward greater flexibility, for better or worse.
(Source: VentureBeat)





