AI2’s MolmoAct Uses 3D Thinking to Rival Nvidia, Google in Robotics AI

▼ Summary
– The Allen Institute for AI (Ai2) released MolmoAct 7B, an open-source model enabling robots to reason in 3D space, challenging Nvidia and Google in physical AI.
– MolmoAct is an Action Reasoning Model that helps robots understand their surroundings, plan actions, and interact with the physical world more effectively.
– The model outperformed competitors like Google and Nvidia in benchmark tests, achieving a 72.1% task success rate.
– Experts view MolmoAct as a significant but incremental advancement in 3D physical reasoning, with potential applications in home and industrial robotics.
– Physical AI is gaining traction, with companies like Meta, Nvidia, and Google developing models to enhance robotic spatial awareness and task execution.
The field of physical AI is rapidly evolving as tech giants and research institutions push the boundaries of robotic intelligence. By combining large language models with spatial reasoning capabilities, new systems are emerging that allow machines to interact with their environment in more sophisticated ways. One such breakthrough comes from the Allen Institute for AI (AI2), which recently unveiled MolmoAct 7B, an open-source model designed to outperform existing solutions from industry leaders like Nvidia and Google.
Unlike traditional vision-language-action models that process information in two dimensions, MolmoAct introduces 3D reasoning, enabling robots to better understand and navigate physical spaces. The model analyzes surroundings by generating spatially grounded perception tokens, which help it estimate distances between objects and plan precise movements. This approach allows robotic arms or humanoid systems to adjust their actions dynamically, whether lowering an arm by inches or reaching for an object, with minimal manual adjustments required.
Early benchmark tests show promising results, with MolmoAct achieving a 72.1% task success rate, surpassing competing models. While still in development, the technology demonstrates significant potential for real-world applications, particularly in home environments where unpredictability poses a major challenge for robotics.
Experts view this as a meaningful step forward rather than a revolutionary leap. Alan Fern, a professor at Oregon State University, notes that while the benchmarks remain somewhat controlled, the shift toward true 3D scene understanding marks important progress. Meanwhile, industry professionals like Daniel Maturana of Gather AI highlight the value of AI2’s decision to open-source the training data, making advanced robotics research more accessible to academics and enthusiasts alike.
The broader physical AI landscape is heating up, with companies like Google, Meta, and Nvidia investing heavily in robotic intelligence. Google’s SayCan leverages LLMs for task sequencing, while Meta’s OK-Robot focuses on visual-language integration for movement planning. Even startups are entering the fray, with Hugging Face’s affordable desktop robot aiming to democratize development.
Despite these advancements, challenges remain. Real-world environments are far more complex than lab settings, and achieving general physical intelligence, where robots adapt seamlessly without explicit programming, is still a work in progress. However, as models like MolmoAct continue to refine spatial reasoning, the gap between controlled experiments and practical applications narrows. The next phase of innovation will likely focus on enhancing adaptability, ensuring robots can handle the unpredictability of everyday scenarios.
For now, the progress in 3D-aware AI models signals a shift toward more intuitive robotic systems, bringing us closer to a future where machines interact with their surroundings as fluidly as humans do.
(Source: VentureBeat)
