AI & Tech Artificial Intelligence Business Newswire Technology

6 Lessons from 50 AI Agents’ First Performance Reviews

September 29, 2025Last Updated: September 29, 2025

2 minutes read

A single red pawn stands out among many blue pawns connected by a network.

▼ Summary

– McKinsey observed over 50 AI agent implementations for a year and found they require significant effort to become effective.
– AI agents perform best when integrated into workflows to address specific user pain points rather than being implemented for their own sake.
– Agents are not suitable for all business needs and should only be used when their capabilities match the task requirements.
– AI agents frequently produce low-quality outputs (“AI slop”) that frustrate users and require continuous development and feedback to improve.
– Human oversight remains essential for AI agents to ensure accuracy, handle edge cases, and maintain compliance in all scenarios.

McKinsey recently completed a comprehensive one-year performance review of over 50 agentic AI implementations, providing valuable insights for businesses considering this technology. The findings reveal that while these digital workers show promise, they demand significant development effort and aren’t suitable for every business scenario. Human colleagues frequently expressed dissatisfaction with the agents’ output quality, highlighting the need for careful implementation strategies.

The consulting firm’s detailed analysis identified six crucial lessons from their extensive experience with AI agents in real-world business environments. These observations come from monitoring numerous agentic AI builds across different organizational functions.

Agents perform better within workflows than as standalone solutions. Simply deploying AI agents without strategic purpose yields limited benefits. The McKinsey team discovered that successful implementations focus on reinventing complete workflows that integrate people, processes, and technology. Organizations should begin by addressing specific user pain points. Industries handling extensive documentation, including insurance providers and legal practices, particularly benefit from agents managing repetitive procedural steps.

Agents aren’t always the answer to business challenges. Companies should evaluate AI agents with the same rigor they apply to hiring human team members. The critical question becomes what work needs completion and which team member, human or digital, possesses the right capabilities for each task. For problems requiring standardized, repetitive approaches with minimal variation, simpler alternatives like rules-based automation, predictive analytics, or direct LLM prompting often prove more effective than complex agentic systems.

AI ‘slop’ has been a recurring issue affecting user adoption. Many agentic systems that appear impressive during demonstrations frustrate actual users with low-quality outputs. This quality problem erodes trust and leads to abandonment of the technology. Organizations should approach agent development with the same commitment they apply to employee development. Clear job descriptions, proper onboarding procedures, and continuous feedback mechanisms help agents improve their performance over time.

It’s difficult to track large numbers of agents effectively. Monitoring a handful of AI agents remains relatively straightforward, but scaling to hundreds or thousands creates significant oversight challenges. When errors occur, an inevitable consequence of scaling, identifying the precise failure point becomes complex. The McKinsey team recommends building monitoring and evaluation directly into workflows using observability tools. This approach enables early error detection and continuous performance refinement even after deployment.

Agents show the best value when shared across functions rather than created for individual tasks. Many organizations make the mistake of developing unique agents for every identified need, resulting in substantial redundancy and wasted resources. The same agent can frequently handle multiple tasks sharing common actions like data ingestion, information extraction, search operations, and analytical functions. Companies should identify recurring tasks and invest in reusable agent components that developers can easily access across different workflows.

Agents will never work completely on their own without human oversight. Human workers remain essential for monitoring model accuracy, ensuring regulatory compliance, applying judgment, and managing edge cases. Organizations must redesign work processes to facilitate effective human-agent collaboration. Without this focus, even the most sophisticated agentic programs risk silent failures, accumulating errors, and ultimate rejection by users. This reality suggests that next year’s performance appraisals might still show room for improvement as the technology continues evolving.

(Source: ZDNET)