Artificial Intelligence BigTech Companies Newswire Technology

Apple M5 Unleashes Blazing-Fast Local AI on MLX

November 22, 2025Last Updated: November 22, 2025

2 minutes read

Stylized diagram of Apple's M5 chip with a teal gradient on a black background.

▼ Summary

– Apple’s M5 chip shows significant performance improvements over the M4 for running local large language models, as detailed in their Machine Learning Research blog.
– MLX is Apple’s open-source framework that enables efficient machine learning on Apple silicon by leveraging unified memory and supporting CPU/GPU operations without data movement.
– MLX LM allows developers to download and run models from Hugging Face locally on Macs, supporting quantization to reduce memory usage and speed up inference.
– In benchmarks, the M5 achieved a 19-27% performance boost in token generation speed compared to the M4, attributed to higher memory bandwidth and new GPU Neural Accelerators.
– The M5 also demonstrated over 3.8x faster image generation than the M4, highlighting its enhanced capabilities for various machine learning tasks.

Apple’s latest M5 chip demonstrates a substantial leap in processing power for on-device artificial intelligence tasks, significantly outperforming its predecessor, the M4. Recent benchmarks from the company’s machine learning research division highlight these gains, particularly when running large language models through Apple’s specialized MLX framework. This development marks a critical step forward for developers and users who rely on local AI processing without cloud dependencies.

The foundation for these performance improvements lies in MLX, an open-source array framework Apple introduced several years ago. MLX serves as a comprehensive toolkit that enables efficient machine learning operations directly on Apple silicon Macs. This framework provides developers with familiar APIs and interfaces that streamline the process of building, training, and running AI models natively. What makes MLX particularly powerful is its seamless integration with Apple’s hardware architecture, allowing operations to run on either CPU or GPU without the performance overhead of memory transfers between components.

Within the MLX ecosystem, MLX LM stands out as a specialized package designed specifically for text generation and language model fine-tuning. This tool allows developers to download most models available on Hugging Face and run them locally on Apple hardware. The framework’s support for quantization, a compression technique that reduces memory usage, enables even large models to operate efficiently, resulting in faster response times during inference when models generate answers to user prompts.

Apple’s performance comparison between the M5 and M4 chips reveals impressive gains across multiple metrics. The company tested various models including Qwen 1.7B, 8B, and 14B in different precision formats, along with mixture-of-experts models like Qwen 30B and GPT OSS 20B. These evaluations measured both the time to generate the first token after receiving a prompt and the subsequent generation speed for 128 additional tokens. The distinction between first-token and subsequent-token performance is crucial because initial token generation is computationally intensive, while follow-up tokens depend more heavily on memory bandwidth.

The results showed consistent improvements across the board. The M5 delivered performance boosts ranging from 19% to 27% over the M4, largely attributable to its enhanced memory bandwidth of 153GB/s compared to the M4’s 120GB/s. This 28% increase in memory bandwidth directly translates to faster processing for memory-intensive AI workloads. Additionally, Apple noted that the 24GB MacBook Pro can comfortably handle an 8B model in BF16 precision or a 30B quantized mixture-of-experts model, keeping memory usage under 18GB for both chip architectures.

Beyond text generation, Apple’s testing extended to image creation tasks, where the M5 demonstrated even more dramatic improvements. In image generation benchmarks, the M5 completed tasks more than 3.8 times faster than the M4, showcasing the chip’s enhanced capabilities across different AI domains. These advancements position Apple’s latest silicon as a formidable platform for local AI development and deployment, offering professionals and enthusiasts alike the ability to run sophisticated models directly on their devices without compromising performance.

(Source: 9to5 Mac)