OpenAI’s ChatGPT Makes Fake Photos Effortless

▼ Summary
– Historically, convincingly altering a photo required significant skill or tools like a darkroom or Photoshop.
– OpenAI’s new GPT Image 1.5 simplifies this process to typing a sentence, making photorealistic image manipulation a casual, skill-free process.
– This release follows Google’s earlier public prototype, with OpenAI’s model being faster and cheaper than its predecessor.
– GPT Image 1.5 is a “native multimodal” model, processing images and text as the same kind of data within a single neural network.
– The model can alter visual reality by changing poses, styles, or objects, and allows for conversational, iterative editing like refining an email draft.
The ability to create and alter realistic images has been fundamentally transformed, moving from a specialized skill to a simple conversational task. OpenAI’s new GPT Image 1.5 model, released to all ChatGPT users, allows anyone to generate or modify a photo by typing a sentence. This tool makes sophisticated image manipulation effortless and widely accessible, representing a significant leap in how visual content is created.
While not the first of its kind, the model’s arrival follows notable developments from competitors like Google. Google introduced a public prototype earlier in the year, which evolved into its popular Nano Banana image model. The positive reception to Google’s tool within the AI community helped accelerate OpenAI’s own efforts to bring a similar capability to market.
GPT Image 1.5 is reportedly up to four times faster at generating images than its predecessor and costs about 20 percent less to operate through its programming interface. This efficiency and affordability lower the barrier to entry, pushing photorealistic image editing further into the realm of casual, everyday use. The model is described as “native multimodal,” meaning it processes both language prompts and visual data within a single, unified neural network. This contrasts with earlier systems like DALL-E 3, which relied on a different method called diffusion.
This architectural approach treats images and text as the same fundamental material: predictable patterns of data known as tokens. When a user uploads a photo and types a command, such as “put him in a tuxedo at a wedding”, the model interprets the pixels and the words together in the same conceptual space. It then generates new pixels in a manner analogous to predicting the next word in a sentence.
The practical result is a tool with remarkable flexibility. GPT Image 1.5 can alter a subject’s pose, change their position in a scene, or render an environment from a new angle. It performs tasks like removing unwanted objects, switching visual styles, adjusting clothing details, and refining specific areas while maintaining a person’s facial likeness across multiple edits. Users can engage in a back-and-forth dialogue with the AI about a photograph, iteratively refining the result much like editing a piece of text in ChatGPT. This conversational interface makes complex visual editing an intuitive process, requiring no prior expertise in graphic design or photo software.
(Source: Ars Technica)





