Topic: human baseline

  • AI Robot Embodying an LLM Channels Robin Williams

    AI Robot Embodying an LLM Channels Robin Williams

    Researchers tested large language models (LLMs) on a vacuum robot with the task "pass the butter," revealing significant gaps in AI capabilities for physical tasks despite some humorous outcomes. The top-performing LLMs, Gemini 2.5 Pro and Claude Opus 4.1, achieved only 40% and 37% accuracy, far ...

    Read More »