DeepL Adds Voice Translation to Its AI Tools

▼ Summary
– DeepL has launched a new voice-to-voice translation suite for meetings, mobile/web chats, and group conversations, alongside an API for developers.
– The company is releasing add-ons for platforms like Zoom and Microsoft Teams, offering real-time audio or on-screen text translation, currently in early access.
– Its technology can learn custom vocabulary, such as industry terms or names, and allows group participation via methods like QR codes.
– DeepL’s current system converts speech to text, translates it, and converts it back to speech, but it aims to develop an end-to-end model that skips text.
– The company faces competition from startups like Sanas, Camb.AI, and Palabra, which offer real-time accent modification, media dubbing, and voice-preserving translation.
DeepL, a company long recognized for its high-quality text translation services, has officially entered the voice translation arena. The firm announced a new suite of voice-to-voice translation tools designed for business meetings, mobile conversations, and frontline worker communications. Alongside these applications, DeepL is launching an API for developers, enabling businesses to integrate this technology into custom solutions like call center platforms.
According to CEO Jarek Kutylowski, this expansion was a logical progression. Having established a strong reputation in text and document translation, the company identified a gap in the market for a superior real-time voice translation product. The primary technical hurdle, he explained, involves balancing low latency with translation accuracy, ensuring the spoken translation follows the original speech with minimal delay.
The initial offerings include add-ons for Zoom and Microsoft Teams. In these environments, participants can either listen to a real-time audio translation or read translated subtitles on their screens. This program is currently in an early access phase, with organizations invited to join a waitlist. DeepL also provides a separate product for one-on-one conversations that can occur via mobile or web, whether in person or remotely.
For collaborative settings like training workshops, the platform supports group conversations where participants can join simply by scanning a QR code. A key feature of the technology is its ability to learn custom vocabulary, adapting to industry-specific terminology, company names, and personal names to improve contextual accuracy.
Kutylowski highlighted the transformative potential of AI in customer service, noting that a robust translation layer allows companies to offer support in languages where hiring qualified, fluent staff is difficult and costly. This positions the technology as a strategic tool for global business operations.
DeepL states it controls the entire voice-to-voice stack, though the current process involves converting speech to text, translating that text, and then synthesizing speech again. The company believes its years of focus on text translation quality give it a significant advantage in the final output. Looking ahead, DeepL aims to develop an end-to-end voice translation model that would bypass the text intermediary entirely for greater speed and fluidity.
The company enters a competitive field with several well-funded startups. Sanas, which raised $65 million last year, uses AI for real-time accent modification, primarily targeting call center agents. Dubai-based Camb. AI focuses on speech synthesis and translation for media companies, aiding in large-scale video dubbing. Another direct competitor is Palabra, backed by Reddit co-founder Alexis Ohanian’s venture firm, which is building an engine to translate speech in real-time while preserving the speaker’s original voice.
(Source: TechCrunch)




