How Gemini-Powered Siri Works Behind the Scenes

▼ Summary
– Google and Apple are reportedly close to a $1 billion annual deal for Google’s Gemini AI to power a revamped Siri next year.
– The Gemini model for Apple will have 1.2 trillion parameters and will be hosted on Apple’s Private Cloud Compute servers to ensure privacy.
– Comparing the model’s size is difficult because major AI labs like OpenAI and Anthropic no longer disclose parameter counts for their latest models.
– The 1.2 trillion parameter model will likely use a mixture of experts (MoE) architecture to improve computational efficiency and reduce costs.
– MoE works by activating only a few specialized sub-networks per input, allowing large models to run with the computational cost of much smaller ones.
A potential multi-billion dollar collaboration between Google and Apple could fundamentally reshape how Siri operates, with reports indicating a specialized Gemini model featuring 1.2 trillion parameters may soon power the voice assistant. This partnership focuses on delivering advanced AI capabilities while maintaining user privacy through Apple’s Private Cloud Compute infrastructure, ensuring data remains secure and inaccessible to Google.
The sheer scale of a 1.2 trillion parameter model is significant, though comparing it directly to other leading AI systems proves difficult. Major AI developers like OpenAI, Anthropic, and Google itself have become increasingly secretive about the specific sizes of their newest models, including GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5. Estimates for these systems vary dramatically, with some analysts suggesting they operate below one trillion parameters while others believe they reach several trillion. The exact figures remain undisclosed, making definitive comparisons challenging.
What many of these massive contemporary models do share is a foundational design known as mixture of experts (MoE). Apple is already familiar with this approach, reportedly using a MoE-based model for its current cloud operations, which is rumored to contain roughly 150 billion parameters. It is highly probable that the Gemini-powered Siri will utilize a similar MoE framework.
The mixture of experts architecture organizes a model into numerous specialized sub-networks, or “experts.” For every user query, the system intelligently selects and activates only a handful of the most relevant experts. This selective activation makes the model significantly faster and far more efficient computationally than if the entire network were engaged for every single task.
This design allows MoE models to boast enormous total parameter counts while maintaining manageable operational costs. Instead of utilizing all parameters simultaneously, only a small fraction performs calculations for any given input. For instance, a model with 1.2 trillion total parameters might be structured with 32 individual experts. Typically, just two to four of these experts would be active per processing token. This means only about 75 to 150 billion parameters are actively working at any moment, delivering the sophisticated capabilities of a massive model with the computational expense of a much smaller one.
While no official details have been released about the specific architecture of the model Google might supply, the reported 1.2 trillion parameter size strongly suggests an MoE design is necessary for practical and efficient operation. The critical question remains whether this configuration will be sufficiently powerful to keep a Gemini-enhanced Siri competitive against the next generation of AI models expected to debut by its launch next year.
(Source: 9to5 Mac)





