Artificial Intelligence BigTech Companies Newswire Technology

Anthropic Reveals Claude Research Agent’s Multi-Agent Blueprint

The Wiz June 15, 2025Last Updated: June 17, 2025

2 minutes read

Abstract art featuring interconnected circles, geometric shapes, and a flowing line, in black, terracotta, and sage green on a beige background.

▼ Summary

– Anthropic’s new Claude Research agent uses a multi-agent approach to enhance complex searches, with a lead agent coordinating specialized sub-agents for parallel processing.
– The multi-agent system outperformed a standalone Claude Opus 4 agent by 90.2% in internal tests, using Claude Opus 4 as the main coordinator and Claude Sonnet 4 as sub-agents.
– Performance depends heavily on token consumption, with multi-agent runs using 15 times more tokens than standard chats, but model choice and tool configuration also play key roles.
– Claude 4 can self-correct by recognizing mistakes and revising tool descriptions, effectively acting as its own prompt engineer in certain scenarios.
– Anthropic plans to move toward asynchronous execution, allowing agents to create sub-agents and work in parallel, though challenges in coordination and error handling remain unsolved.

Anthropic’s latest breakthrough in AI research introduces a sophisticated multi-agent system designed to revolutionize how complex searches are conducted. The company has unveiled technical details about its Claude Research agent, showcasing an architecture that dramatically enhances both speed and accuracy when handling intricate queries.

At the core of this system lies a lead agent responsible for dissecting user prompts, formulating a strategy, and deploying specialized sub-agents to gather information simultaneously. This parallel processing capability enables the system to tackle more demanding tasks with greater efficiency than a single-agent approach could achieve.

Internal benchmarks reveal staggering results, the multi-agent setup outperformed a standalone Claude Opus 4 model by 90.2%. The framework leverages Claude Opus 4 as the primary coordinator while employing Claude Sonnet 4 for subsidiary tasks. To ensure high-quality outputs, Anthropic employs an LLM-as-judge method, evaluating responses based on factual correctness, source reliability, and effective tool utilization. This technique not only enhances accuracy but also positions large language models as meta-tools capable of overseeing other AI systems.

A critical consideration in this setup is token consumption, with multi-agent operations requiring roughly 15 times more tokens than standard interactions. Testing indicates that token usage accounts for approximately 80% of performance variations, though model selection and tool configuration also play pivotal roles. For instance, switching to Claude Sonnet 4 yielded better results than merely increasing the token budget for an older version, highlighting the importance of balancing resources with model capabilities.

Another notable feature is the system’s ability to self-correct. In certain cases, Claude 4 can identify its own errors and refine tool descriptions autonomously, effectively acting as its own prompt engineer. This self-improvement mechanism suggests a future where AI systems continuously optimize their performance without human intervention.

Looking forward, Anthropic envisions asynchronous execution as the next evolutionary step. Currently, the system waits for all sub-agents to complete their tasks before proceeding, but future iterations could allow agents to spawn additional sub-agents dynamically, working in parallel without synchronization delays. While this promises greater flexibility and speed, it also introduces complexities in coordination, state management, and error handling, challenges that remain unresolved.

By pushing the boundaries of multi-agent AI, Anthropic is paving the way for systems that not only process information faster but also adapt and refine their methods over time. The implications for research, data analysis, and decision-making could be transformative as these technologies mature.

(Source: THE ENCODER)