Google’s Gemma 4 AI models now use Apache 2.0 license

▼ Summary
– Google has launched Gemma 4, a new set of four open-weight AI models optimized for running on local hardware.
– The two largest models, 26B Mixture of Experts and 31B Dense, are designed to run on powerful local GPUs like the Nvidia H100, with the 26B model prioritizing speed by activating fewer parameters.
– The two smaller models, Effective 2B and 4B, are designed for mobile devices and optimized with chipmakers for low memory use and near-zero latency.
– Google has replaced the custom Gemma license in response to developer frustrations, making the models more open.
– Google claims Gemma 4 models are significantly more capable than Gemma 3, with the 31B model ranking highly on a benchmark list despite being smaller than top competitors.
Google’s latest generation of open-weight AI models has arrived, promising greater power and a significant shift toward openness. The newly released Gemma 4 family is engineered for local deployment and comes with a crucial change: it now uses the permissive Apache 2.0 license, replacing the previous custom license. This move directly addresses developer concerns over restrictive terms, making the models far more accessible for commercial and research use.
Designed to operate on local hardware, the suite includes four distinct sizes. The two largest variants, the 26B Mixture of Experts and the 31B Dense model, are built to run unquantized on a single high-end GPU like an 80GB Nvidia H100. For broader accessibility, these models can be quantized to lower precision, enabling them to function on more common consumer-grade graphics cards. A key performance highlight is the 26B Mixture of Experts model, which activates only 3.8 billion of its total parameters during inference. This selective activation grants it a substantially higher tokens-per-second rate compared to other models of similar scale, focusing on reduced latency. The 31B Dense model prioritizes output quality over raw speed and is intended for developers to fine-tune for specialized applications.
For resource-constrained environments, Google introduces the Effective 2B (E2B) and Effective 4B (E4B) models. Targeted at mobile and edge devices like smartphones, Raspberry Pi, and Jetson Nano boards, these models are optimized to run at an effective parameter count of 2 or 4 billion. Developed in collaboration with the Pixel team, Qualcomm, and MediaTek, they are engineered for minimal memory footprint and battery drain. Google promotes their near-zero latency and notes they are more efficient than their Gemma 3 predecessors.
Google positions the entire Gemma 4 lineup as a major leap forward, asserting these are the most capable models available for local hardware. The company states the Gemma 31B model will debut in third place on the popular Arena list for open AI models, trailing only behind much larger models like GLM-5 and Kimi 2.5. This performance, achieved at a fraction of the computational size and cost, underscores the efficiency of the new architecture. For developers seeking powerful, locally-runnable AI without restrictive licensing, Gemma 4 represents a compelling new option.
(Source: Ars Technica)