AI & TechBigTech CompaniesBusinessNewswireTechnology

Amazon solves key technical problem for future data centers

▼ Summary

– Amazon claims a major networking breakthrough using a “quasi-random” design that increases data speeds and reduces energy use, deployed in its data centers since late 2023.
– The new technology, called RNG (resilient network graphs), combines elements of structured and random architectures to flatten the network and eliminate bottlenecks.
– Amazon designed the ShuffleBox, a new piece of equipment that automatically shuffles cables required for this random networking approach.
– The RNG design is intended to improve everyday data center efficiency, not specifically for generative AI workloads.
– Traditional data center networks have used a “fat-tree” topology since the 1980s, where data moves up and down vertical layers of switches and routers.

Amazon claims it has quietly solved a long-standing technical challenge in networking design, deploying the new architecture in its data centers since late last year. The company says the breakthrough delivers significantly faster data speeds while reducing energy consumption, a development that could give it a competitive advantage as cloud providers race to build more powerful systems.

At the heart of this innovation is a “quasi-random” design that blends the structure of traditional data networks with the performance benefits of more randomized architectures. While researchers have studied random networks for decades, scaling the technology has remained elusive. Amazon believes it has finally cracked that code.

The fact that Amazon is already using this technology in real-world applications is “remarkable,” according to Brighten Godfrey, a computer science professor at the University of Illinois at Urbana-Champaign and a networking expert not involved in Amazon’s work. Godfrey co-authored a pivotal 2012 paper on random network graphs, describing them as a “mind-bending problem to solve, in general.”

Since 2023, a dedicated team of engineers and researchers at Amazon Web Services, including several recruited from academia, has been tackling the random networking challenge. The company also designed a new piece of data center equipment called the ShuffleBox, which automatically rearranges the cables needed for this type of network.

“By essentially flattening the network, we eliminated the bottlenecks that come with traditional networking designs,” said Matt Rehder, vice president of AWS Network Engineering, in an exclusive interview. “We think we’re the only ones who have done this at scale.”

Amazon detailed the new design in a paper published last month titled RNG: Flat Datacenter Networks at Scale.” RNG stands for resilient network graphs,” which are neither fully structured nor fully random.

Notably, the RNG team is not pitching this technology around generative AI. Instead, the focus is on making Amazon’s everyday data center architecture more efficient. “RNG is a great fit for our core demands, but AI training data patterns are far more coordinated and centrally orchestrated, so they don’t approximate a random graph,” Rehder explained.

Since the mid-1980s, most communications networks,from telecom to data centers,have relied on a “fat-tree” topology. This structure includes two or three vertical layers of switches and routers, connected by “fat” nodes at the top where multiple routers of the same type reside, with thinner branches toward the bottom. In a fat-tree network, data moves up and down the stack. The increased bandwidth near the top helps prevent bottlenecks, but Amazon’s new approach aims to eliminate those chokepoints entirely by flattening the architecture.

(Source: Wired)

Topics

data center networking 98% random network graphs 95% amazon web services 93% network performance 90% shufflebox hardware 88% fat-tree topology 85% scalability challenges 82% Generative AI 78% network bottlenecks 75% academic research 72%