Google launches Gemini 3.5 Live Translate for instant voice conversion

▼ Summary
– Google’s Gemini 3.5 Live Translate offers instant speech-to-speech translation in over 70 languages, with low latency.
– The new model is part of the Gemini 3.5 family, following the earlier Flash version, with a Pro model expected soon.
– It is fast enough to keep pace with normal conversation, matching intonation, pacing, and pitch for more natural-sounding speech.
– The feature is rolling out across Google’s ecosystem, including a public preview for developers in the Gemini Live API or AI Studio.
– The model processes speech continuously, handles multiple languages automatically, and filters out background noise.
Google has long pursued real-time translation, describing it as one of its “pioneering machine learning experiments.” Past demonstrations at Google events often required specific hardware, such as Google phones or earbuds. Last year, the company expanded access to live translation within the Translate app, and now it is broadening that reach even further. With the debut of Gemini 3.5 Live Translate, users gain instant voice conversion across more environments, with significantly reduced latency.
This new AI model belongs to the Gemini 3.5 family, first unveiled at I/O. Until now, only the Flash version had been released, but a Pro model is anticipated in the coming weeks. Gemini 3.5 Live Translate operates as a speech-to-speech model, fine-tuned to automatically detect and convert over 70 languages.
According to Google, Gemini 3.5 Live Translate is fast enough to sustain a natural conversation, lagging only a few seconds behind the speaker. It also mirrors intonation, pacing, and pitch, making the translated voice sound more like the original speaker than a generic robot. While the initial demos, recorded under controlled conditions, appear impressive, users won’t have to wait long to test the model’s real-world performance.
The rollout of Gemini 3.5 Live Translate spans multiple parts of the Google ecosystem. Developers can start building with a public preview available through the Gemini Live API or AI Studio. The model processes speech continuously and handles multilingual inputs automatically, removing the need for manual configuration. It also filters out background noise in busy settings, enhancing clarity during conversations.
(Source: Ars Technica)




