Artificial IntelligenceBusinessNewswireTechnology

Phison CEO on 244TB SSDs, PLC NAND, and the Problem with High Bandwidth Flash

▼ Summary

– The primary bottleneck for running AI models is insufficient memory, not a lack of computing power, which can cause system crashes.
– Phison’s aiDAPTIV+ solution uses SSD storage as an expandable memory pool to compensate for DRAM limits and improve AI inference responsiveness.
– A key goal is reducing Time to First Token (TTFT) by storing frequently used data like KV cache on SSDs, preventing recomputation for repeated queries.
– Companies often buy extra GPUs primarily to aggregate VRAM, leading to wasted compute capacity, which scalable SSD memory could prevent.
– For hyperscalers, AI revenue depends on inference requiring massive data storage, driving the development of high-capacity SSDs like a 244TB model.

The true bottleneck for running advanced AI models isn’t processing power; it’s memory availability. This fundamental constraint impacts everything from personal laptops to massive data centers, shifting the focus from raw compute to how systems manage and access data. According to Phison CEO Pua Khein Seng, the inventor of the first single-chip USB flash drive, insufficient memory can cause systems to crash, making it a critical hurdle for practical AI deployment.

To address this, Phison is pioneering its aiDAPTIV+ technology, a method that uses NAND flash storage as an expanded memory pool to compensate for DRAM limitations. This approach allows integrated GPU systems to offload memory-heavy tasks to SSDs, keeping the graphics processors focused on computation rather than waiting for data. A key practical benefit is drastically improving Time to First Token (TTFT), the delay a user experiences after submitting a prompt before seeing the first AI-generated output. Pua argues that long wait times ruin the user experience, making local AI feel unresponsive even if the model eventually completes its task.

He compares the problem to a doctor who must repeat the same instructions to every patient because no records are kept between visits. In AI inference, a component called the KV cache, similar to cookies in a web browser, stores frequently used data. Most systems lack enough DRAM to retain this cache, forcing them to recompute information for every query. Phison’s solution stores this frequently accessed cache directly in the storage system, allowing for near-instant retrieval when a user repeats or revisits a question.

This memory-centric philosophy extends to enterprise hardware. Pua observes that many organizations purchase additional GPU cards primarily to aggregate more VRAM, not for extra computing power. This leads to inefficient use of expensive silicon, with many GPUs sitting idle. By using high-capacity SSDs to create a larger, scalable memory pool, companies can buy GPUs specifically for their compute capabilities and scale them appropriately. “Once you have enough memory, then you can focus on compute speed,” Pua notes.

The discussion naturally leads to the storage needs of cloud service providers (CSPs) building AI infrastructure. Pua points out that while CSPs have invested over $200 billion in GPUs, they don’t generate revenue directly from these processors. Profit comes from inference services, which are entirely dependent on massive, readily accessible data storage. He summarizes this relationship succinctly: “CSP profit equals storage capacity.”

This economic reality drives Phison’s development of extreme-capacity enterprise SSDs. The company has announced a 244TB SSD, a leap from its current 122TB drive. Pua explains that the current model uses a controller with 16-layer NAND stacking. Reaching 244TB is conceptually straightforward, requiring 32-layer stacking with the same design. The primary challenge is achieving acceptable manufacturing yields. An alternative path involves waiting for higher-density 4Tb NAND dies, which would allow a 244TB drive with just 16 layers, contingent on supplier readiness.

Regarding future NAND technologies, Pua clarified that PLC (five-bit) NAND is dependent on manufacturers perfecting the technology. Once they can ship it reliably, Phison’s controller designs will be ready to support it. However, he expressed skepticism about the trend of integrating flash memory directly into GPU-style memory stacks, often called high-bandwidth flash. The core issue is a mismatch in endurance: NAND flash has a finite number of write cycles, while GPUs are designed to last much longer.

“The challenge with integrating NAND directly with GPUs is the write cycle limitation,” Pua said. “If you integrate them, when the NAND reaches end-of-life, you have to discard the entire expensive GPU card.” Phison advocates for a modular approach where SSDs remain replaceable, plug-and-play components. This allows companies to swap out a worn storage drive while preserving the valuable GPU, a more economical and sustainable model.

Ultimately, Pua’s vision for AI hardware prioritizes building systems where memory capacity is cheap, scalable, and replaceable. Whether for local inference on a laptop or rack-scale operations in a hyperscaler, the practical limits of what AI can achieve will be defined by advances in storage density and memory expansion long before the next leap in pure computational power.

(Source: TechRadar)

Topics

memory bottleneck 98% ai infrastructure 95% ssd memory expansion 92% gpu utilization 90% dram limits 88% ai inference 87% storage capacity 86% nand flash 85% enterprise ssds 84% time to first token 83%