DeepSeek’s “Sparse Attention” Cuts AI Costs Dramatically

▼ Summary
– ChatGPT slows down in long conversations due to the high computational demands of processing lengthy text sequences, even with existing efficiency measures.
– DeepSeek faces extra pressure to optimize performance with limited AI chips due to export restrictions, unlike US tech giants with abundant hardware.
– DeepSeek released DeepSeek-V3.2-Exp with “DeepSeek Sparse Attention” (DSA), an experimental technique to improve efficiency, building on sparse attention concepts pioneered by OpenAI and Google.
– DeepSeek claims its DSA achieves fine-grained sparse attention for the first time and has reduced API prices by 50% to reflect the efficiency improvements.
– In January, DeepSeek’s R1 model matched OpenAI’s o1 performance at a low training cost of $6 million, and its app briefly topped the iPhone App Store, challenging US AI leaders.
DeepSeek’s innovative “Sparse Attention” technology represents a major breakthrough in reducing the computational costs of artificial intelligence systems. Anyone who has experienced ChatGPT slowing down during extended conversations understands the fundamental challenge: processing lengthy text sequences demands enormous computing power. While American tech giants often address this by deploying additional hardware, Chinese AI firm DeepSeek faces different circumstances due to export restrictions limiting access to advanced AI chips. This constraint has motivated the company to develop more efficient methods that extract superior performance from available resources.
The company recently launched an experimental version of its latest reasoning model, DeepSeek-V3.2-Exp, featuring what it terms “DeepSeek Sparse Attention” (DSA). This approach represents the company’s implementation of a computational technique that leading AI developers have explored for years. OpenAI initially introduced sparse transformers back in 2019, utilizing the method to construct GPT-3, while Google Research published similar work on “Reformer” models in 2020. The specific extent to which Western AI companies currently employ sparse attention in their latest models remains largely undisclosed.
Although sparse attention has existed as a known technical approach for several years, DeepSeek asserts that its implementation achieves “fine-grained sparse attention for the first time” in the industry. To demonstrate the practical benefits of these efficiency improvements, the company has reduced its API pricing by fifty percent. This substantial cost reduction directly reflects the operational savings made possible by their enhanced attention mechanism.
The company first captured significant attention in January when its R1 reasoning model reportedly matched the performance of OpenAI’s o1 model despite requiring only six million dollars in training costs. During that period, DeepSeek’s chat application briefly surpassed ChatGPT to claim the top position on the iPhone App Store. These developments have positioned the Chinese AI firm as a serious competitor to established American AI laboratories, with industry observers closely monitoring how this relatively new player continues to challenge technological leaders in the field.
(Source: Ars Technica)



