Artificial IntelligenceNewswireStartupsTechnology

AI Audio Models Will Become Commodities, Says ElevenLabs CEO

▼ Summary

– ElevenLabs CEO Mati Staniszewski believes AI models will become commoditized over the next few years, with differences between them shrinking.
– The company is currently focused on building its own AI models as they provide the biggest short-term advantage and solve audio quality issues.
– Staniszewski predicts that AI models will increasingly adopt multi-modal approaches, combining audio with video or large language models in conversational settings.
– ElevenLabs plans to partner with other companies and use open-source technologies to integrate its audio expertise with other models’ capabilities.
– The company aims to create long-term value by focusing on both model building and applications, similar to how Apple combined hardware and software.

The future of artificial intelligence in audio points toward widespread commoditization, according to ElevenLabs CEO Mati Staniszewski. He shared this perspective during a recent appearance at the TechCrunch Disrupt 2025 conference, outlining both immediate priorities and long-range forecasts for the AI audio sector. Staniszewski indicated that his team has successfully addressed several core model architecture issues, with further advancements expected over the coming one to two years.

Staniszewski predicts that AI audio models will become commoditized within the next few years, even if certain distinctions remain for specific voices or languages. He acknowledged that differences between models will gradually shrink, making advanced audio AI capabilities more universally accessible.

When questioned about ElevenLabs’ continued investment in proprietary model development given this outlook, Staniszewski clarified that superior models currently provide the most significant competitive edge. He emphasized that achieving high-quality, natural-sounding AI voices and interactions remains a pressing challenge that demands direct model-building efforts.

“The only viable solution right now involves developing these models in-house,” Staniszewski explained. “Longer term, other market participants will undoubtedly catch up and offer comparable solutions.”

He further noted that organizations seeking dependable, scalable implementations will probably employ specialized models tailored to distinct use cases. Looking ahead, Staniszewski anticipates a shift toward multimodal AI systems within the next couple of years. These integrated approaches would simultaneously generate audio with video content or combine audio with large language models for conversational applications. He cited Google’s Veo 3 as an existing example of this convergent technology trend.

ElevenLabs intends to pursue collaborations with other firms and explore open-source technologies, aiming to merge its audio specialization with complementary capabilities from other AI domains. The company’s strategy balances ongoing model development with practical applications to build enduring value.

“Similar to how Apple revolutionized technology through integrated hardware and software,” Staniszewski concluded, “we believe combining strong product design with sophisticated AI will unlock the most impactful use cases for this generation.”

(Source: TechCrunch)

Topics

ai commoditization 95% audio ai 90% model architecture 85% company strategy 85% long-term vision 85% ai applications 80% use cases 80% short-term strategy 80% product development 75% multi-modal ai 75%