Physical Intelligence’s new robot brain learns untaught tasks

▼ Summary
– Physical Intelligence’s new π0.7 model can direct robots to perform tasks they were never explicitly trained on, a capability that surprised its own researchers.
– The model demonstrates compositional generalization, combining skills from different contexts to solve novel problems, unlike previous specialist models trained for single tasks.
– In a key test, the model successfully used an air fryer with minimal prior exposure, succeeding with step-by-step verbal instructions that allow real-time coaching without retraining.
– The researchers acknowledge limitations, including the model’s inability to execute complex multi-step tasks from a single command and a lack of standardized benchmarks for external validation.
– The company has raised over $1 billion and is reportedly in discussions for a new funding round that would significantly increase its valuation.
A San Francisco robotics startup has released research indicating its latest AI model can guide robots through tasks they were never specifically taught to perform. This development, which the company’s own scientists admit was unexpected, points toward a future where machines can adapt to new challenges with minimal human intervention. The findings suggest the field of robotic AI may be nearing a transformative moment, where capabilities begin to scale in surprising and nonlinear ways.
The core advancement centers on compositional generalization. This is the ability for an AI system to combine skills learned in isolation to tackle entirely novel problems. Traditional robot training has relied on a narrow, task-specific approach, essentially requiring a new model for every single job. The new model, designated π0.7, aims to break that inefficient cycle. According to Sergey Levine, a co-founder of the startup and a UC Berkeley professor, once a model crosses the threshold from simple repetition to creative skill recombination, its capabilities expand more rapidly than the underlying data would suggest. This favorable scaling property mirrors what has been observed in other AI domains like language and vision.
One of the most compelling demonstrations involved an air fryer, an appliance the model had virtually no direct training on. Upon investigation, researchers found only two fleeting references in its entire dataset: one clip of a robot pushing an air fryer closed and another from an open-source collection showing a robot placing a bottle inside one. From these sparse fragments and broader web-based pretraining, the model synthesized a functional understanding of the device. Ashwin Balakrishna, a research scientist at the company, notes the difficulty in tracing the origin of such knowledge or predicting its limits. Without any prior coaching, the model made a credible attempt to cook a sweet potato. When given step-by-step verbal instructions, akin to training a new employee, it succeeded.
This coaching capability is significant, as it implies robots could be deployed in unfamiliar settings and improved through real-time human guidance, bypassing the need for extensive new data collection or model retraining. The researchers are candid about the system’s current constraints. The model cannot yet execute complex, multi-step chores from a single high-level command. You cannot simply tell it to make toast, Levine explains. However, if you verbally walk it through each sub-step, it performs reliably.
The team also acknowledges a lack of standardized robotics benchmarks, making independent validation challenging. For evaluation, they compared π0.7 against their own previous specialist models, which were each painstakingly trained for a single task. The generalist model matched their performance across a suite of complex activities, including making coffee, folding laundry, and assembling boxes.
Perhaps the most telling aspect of the research is the surprise expressed by the engineers themselves. These are individuals who intimately know the training data and should, in theory, be able to predict the model’s limits. Balakrishna describes a recent experience where he purchased a random gear set and asked the robot to rotate a gear, a task it accomplished despite no explicit training. Levine compares the moment to early encounters with large language models generating bizarre but coherent combinations, like a story about unicorns in the Andes. Witnessing that same emergent, unpredictable problem-solving in physical robotics, he suggests, is particularly special.
Skeptics will rightly note a fundamental asymmetry: language models train on the vast expanse of the internet, while robots operate in a data-constrained physical world. Levine anticipates a different critique, that the demonstrated tasks seem mundane. He argues this is the essential point. The distinction between a flashy, pre-programmed robot stunt and a system that genuinely generalizes is crucial. True generalization may look less dramatic, but it is far more useful for practical applications.
The published paper is carefully worded, describing “early signs” of generalization and “initial demonstrations” of new capabilities. This is foundational research, not a market-ready product. When pressed on a timeline for real-world deployment, Levine declined to speculate, stating only that progress is occurring faster than he anticipated a few years ago.
The startup has raised considerable capital, with a recent valuation of $5.6 billion. A significant portion of investor confidence is attributed to co-founder Lachy Groom, a respected figure in Silicon Valley’s investment community. The company is reportedly in discussions for a new funding round that could nearly double its valuation, though the team has declined to comment on these reports. Throughout, the company has maintained a disciplined focus on research, refusing to provide investors with a firm commercialization schedule.
(Source: TechCrunch)




