AI & TechArtificial IntelligenceNewswireReviewsTechnology

We Unleashed OpenAI’s Agent Mode on the Web – Here’s What It Found

▼ Summary

OpenAI launched Atlas, a web browser with ChatGPT integration that allows users to “chat with a page” and includes Agent Mode for automated tasks.
– Agent Mode is a preview feature designed to perform actions like clicking, scrolling, and reading across tabs to complete work for users.
– The author tested Agent Mode on everyday online tasks, such as playing the game 2048, to evaluate its effectiveness and time-saving potential.
– For the 2048 game test, the prompt instructed the agent to visit play2048.co and achieve the highest possible score without user input.
– The evaluation uses a 10-point scale to rate each task, with 10 indicating perfect execution and 1 representing complete failure.

When OpenAI introduced its new Atlas web browser with integrated ChatGPT capabilities, it promised a more interactive way to engage with online content. The standout feature, Agent Mode, allows the AI to actively perform tasks by clicking, scrolling, and reading across multiple tabs, moving beyond simple conversation to genuine automation. This represents a significant step in bringing practical, agentic AI directly to everyday users.

To evaluate its real-world usefulness, we tested Atlas’s Agent Mode on several common online activities. The goal was to determine whether it could handle tedious jobs efficiently, saving time and effort. For each scenario, we defined a specific web-based challenge, crafted a detailed prompt for the agent, and documented the outcomes. Performance was rated on a ten-point scale, where a perfect ten means the task was completed flawlessly without any intervention.

Our first experiment involved the popular puzzle game 2048. The objective was straightforward: achieve the highest possible score without manual input. We instructed the agent with a simple command: “Go to play2048.co and get as high a score as possible.” Although this task lacks serious utility, it served as an excellent initial assessment of the agent’s ability to interpret visual elements on a webpage and execute appropriate actions. If advanced language models can master complex games, a straightforward tile-sliding game should be well within the capabilities of a web-based AI agent.

(Source: Ars Technica)

Topics

atlas browser 95% agent mode 93% chatgpt integration 90% ai automation 88% task automation 87% web browsing 85% ai agents 83% product release 82% user testing 80% llm capabilities 80%