Cloudflare Misconfiguration Caused Major BGP Route Leak

▼ Summary
– Cloudflare experienced a 25-minute BGP route leak affecting IPv6 traffic, causing congestion, packet loss, and approximately 12 Gbps of dropped traffic.
– The incident was caused by an accidental router misconfiguration that made an export policy overly permissive, incorrectly advertising internal routes to external peers.
– This violation of valley-free routing policies attracted traffic to unintended networks, leading to reliability issues and potential security risks like traffic interception.
– Engineers manually reverted the configuration within 25 minutes, halting the impact, which was similar to a previous incident in July 2020.
– Cloudflare proposed future prevention measures, including stricter export safeguards, automated policy checks, and promoting RPKI ASPA adoption.
A recent technical incident involving a Border Gateway Protocol (BGP) route leak led to significant network disruption for a 25-minute period. Cloudflare has provided a detailed analysis of the event, which resulted in measurable congestion, packet loss, and approximately 12 Gbps of dropped IPv6 traffic. The problem stemmed from an accidental router misconfiguration that propagated beyond the company’s own network, affecting external systems and internet traffic more broadly.
The Border Gateway Protocol is the fundamental system that directs data across the vast collection of interconnected networks, known as autonomous systems, that make up the internet. In this case, a policy error caused Cloudflare to improperly redistribute routing information. According to their statement, the incident on January 22 involved taking routes from certain peers and incorrectly advertising them from their Miami location to other peers and providers. This action was classified under internet standards as a mixture of Type 3 and Type 4 route leaks.
A BGP route leak happens when an autonomous system breaks established “valley-free” routing policies. Essentially, it incorrectly advertises routes it learned from one network partner to another. This violation draws traffic onto paths it was never meant to travel, often leading to congestion, packet loss, or highly inefficient routing. When networks employ strict firewall rules that only accept traffic from authorized providers, this misdirected traffic is simply discarded entirely.
While these incidents primarily create reliability and performance headaches, they also carry a security implication. In more severe cases, such route leaks or hijacks can enable unauthorized parties to intercept and potentially analyze traffic that passes through their networks.
Cloudflare’s investigation pinpointed the root cause. Engineers were implementing a policy change designed to stop their Miami facility from advertising specific IPv6 prefixes originating in Bogotá. However, the removal of certain prefix lists inadvertently made the export policy far too broad. This configuration error allowed an internal route match to accept all internal IPv6 routes and then export them externally. Consequently, every IPv6 prefix Cloudflare uses across its backbone was accepted by this faulty policy and advertised to all its BGP neighbors in Miami.
The company’s monitoring systems detected the issue quickly. Engineers responded by manually reverting the configuration and pausing automation processes, which contained the impact to just under 25 minutes. The underlying code change that triggered the event was later rolled back, and automation was carefully restored.
Cloudflare noted this event bears strong resemblance to a similar incident from July 2020. To prevent future occurrences, the company has outlined several planned improvements. These measures include implementing stricter community-based export safeguards, adding automated checks for policy errors within their development pipeline, enhancing early detection capabilities, validating configurations against relevant internet standards, and promoting wider adoption of security frameworks like RPKI ASPA to help secure internet routing.
(Source: Bleeping Computer)
