ChatGPT Images 2.0 Excels at Text Generation

▼ Summary
– AI image generation has advanced rapidly, with ChatGPT Images 2.0 now producing realistic, usable restaurant menus, unlike models from two years ago that generated misspelled, nonsensical text.
– Historically, AI struggled with spelling because diffusion models focused on reconstructing overall image patterns, not fine details like text.
– The new model uses “thinking capabilities” to search the web, verify its work, and create varied marketing assets or multi-panel comics from a single prompt.
– OpenAI states Images 2.0 has improved fidelity and can render fine details like small text and non-Latin scripts, though its knowledge is current only until December 2025.
– The model is being released to all ChatGPT users, with advanced features for paid tiers and an API whose pricing will vary based on output quality and resolution.
Just a few years ago, a telltale sign of an AI-generated image was its inability to render coherent text. Asking a model to create a simple menu for a Mexican restaurant would yield bizarre, hallucinated items like “burrto” or “churiros.” Today, that glaring weakness has been largely overcome. The newly released ChatGPT Images 2.0 model can produce a fully legible, professionally styled menu that a restaurant could plausibly use immediately. The rapid evolution in text generation within images marks a significant leap forward for AI visual tools.
Historically, this flaw stemmed from the underlying technology. Most image generators relied on diffusion models, which reconstruct images from random noise. As Asmelash Teka Hadgu, founder and CEO of Lesan AI, explained in 2024, written text occupies a minuscule portion of an image’s pixels. The model prioritizes learning broader visual patterns, often at the expense of accurate spelling. Researchers have since explored alternative architectures, like autoregressive models, which predict image components sequentially in a manner similar to large language models. While OpenAI has not disclosed the specific architecture powering Images 2.0, the results speak to a fundamentally improved approach.
The company attributes the model’s new proficiency to integrated thinking capabilities. This allows the AI to perform tasks like searching the web, creating multiple images from a single prompt, and conducting internal checks on its outputs. These functions enable practical applications, such as generating marketing assets in various dimensions or crafting coherent multi-panel comic strips. OpenAI also highlights the model’s enhanced ability to render non-Latin text in languages including Japanese, Korean, Hindi, and Bengali.
In a press release, OpenAI stated that Images 2.0 delivers an “unprecedented level of specificity and fidelity.” The model can conceptualize sophisticated scenes and execute them with high fidelity, adhering to detailed instructions and preserving fine-grained elements that traditionally challenged AI. This includes rendering small text, iconography, user interface elements, and dense compositions at resolutions up to 2K. This increased capability does come with a trade-off in speed. Generating a complex, multi-image output takes several minutes, which is longer than receiving a standard text response from ChatGPT.
Access to ChatGPT Images 2.0 begins this Tuesday for all ChatGPT and Codex users, with paid subscribers receiving allowances for more advanced generations. OpenAI will also release a gpt-image-2 API, with pricing structured according to the desired output quality and resolution. It is important to note that the model’s knowledge is current only through December 2025, which may affect its accuracy for prompts involving very recent events.
(Source: TechCrunch)




