Topic: human baseline

Sort by: Relevance | Date

November 1, 2025
80%
AI Robot Embodying an LLM Channels Robin Williams
Researchers tested large language models (LLMs) on a vacuum robot with the task "pass the butter," revealing significant gaps in AI capabilities for physical tasks despite some humorous outcomes. The top-performing LLMs, Gemini 2.5 Pro and Claude Opus 4.1, achieved only 40% and 37% accuracy, far ...
Read More »