OpenAI’s New o3, o4-mini Models ‘Think With Images’

The Wiz April 17, 2025Last Updated: April 28, 2025

2 minutes read

Abstract digital art: two glowing, swirling forms in teal and gold, separated by a white geometric symbol on a dark background. — Generated with Gemini

▼ Summary

– OpenAI introduced two new AI models, o3 and o4-mini, on April 16th, focusing on enhanced reasoning capabilities.
– The models can natively process and reason about visual information, allowing users to upload and analyze photos, diagrams, and screenshots.
– o3 is the most powerful model, excelling in complex tasks such as coding, math, and visual perception, while o4-mini is a faster, cost-efficient alternative optimized for speed and high volumes of requests.
– Both models can use tools within ChatGPT, like web search and Python code execution, to solve multi-step problems independently.
– The new models are available to paying ChatGPT users and developers via API, with free users accessing reasoning capabilities through a “Think” option.

OpenAI has unveiled its latest AI developments, introducing two new models, o3 and o4-mini, belonging to its “o-series” focused on enhanced reasoning capabilities. Announced April 16th, these models represent a significant step forward, particularly in their ability to natively process and reason about visual information, alongside improvements in areas like coding and math.

Reasoning and Visual Integration

The core advancement highlighted with o3 and o4-mini is their capacity to “think with images.” Unlike previous models that might simply identify objects, these new systems can incorporate visual data directly into their reasoning process, or chain-of-thought. According to OpenAI, this means users can upload photos, diagrams (even blurry or imperfect ones), or screenshots, and the models can analyze, interpret, and use that visual information to solve problems or answer queries. For example, o3 could analyze a complex scientific poster image and potentially draw conclusions not explicitly stated in the text. This integration allows the models to manipulate images internally – cropping, zooming, rotating – as part of their problem-solving, without relying on separate specialized tools.

Meet the Models: o3 and o4-mini

OpenAI positions o3 as its most powerful reasoning model to date, pushing performance boundaries in complex tasks involving coding, math, science, and visual perception. It reportedly makes fewer errors than previous models on difficult tasks and excels at analyzing charts and graphics. o4-mini is designed as a smaller, faster, and more cost-efficient alternative. While still highly capable, especially in math, coding, and visual tasks, it’s optimized for scenarios requiring speed or handling high volumes of requests. Both models are also described as being able to “agentically” use all available tools within ChatGPT – like web search, Python code execution, and image generation – to tackle multi-step problems more independently.

Availability and Context

These new reasoning models are being rolled out, starting with availability for paying ChatGPT users (Plus, Pro, Team) and via the API for developers. Free users can reportedly access the reasoning capability via a “Think” option. This release follows closely on other recent OpenAI announcements, including the GPT-4.1 family of models, indicating a rapid pace of development. The ability to deeply integrate visual understanding into language models is a key area of advancement across the AI industry, and the o3 and o4-mini models represent OpenAI’s latest contribution to this evolving landscape, aiming for AI that interacts with information more comprehensively.