The CenturyLink outage that occurred this past Sunday, August 30, was reportedly one of the largest Internet outages in history, according to Cloudflare which saw its own services go down and recorded a 3.5% drop in total global Internet traffic.
The network outage also took down the services of big tech names including Amazon, Twitter, Hulu, Microsoft (Xbox Live), EA, Blizzard, Steam, Discord, Reddit, Starbucks, Chase, GoDaddy, Peloton, Venmo and many others. It took several hours to fix. In a blog post, networking monitoring firm ThousandEyes noted that the outage started at approximately 6 a.m. ET and was not "fully resolved" until 11:30 a.m. ET.
CenturyLink – a Monroe, Louisiana-based ISP – first posted a tweet on Sunday morning, at 9:30 a.m. ET confirming awareness of the issue:
Ultimately, CenturyLink stated that the outage was caused by an incorrect Flowspec announcement, originating from the company's data center in Mississauga, Canada, that prevented Border Gateway Protocol (BGP) from establishing correctly. Flowspec, an extension of BGP, is a commonly used protocol for pushing out network firewall rules.
As Cloudflare explained in its post-outage blog: "Because this outage appeared to take all of the CenturyLink/Level(3) network offline, individuals who are CenturyLink customers would not have been able to reach Cloudflare or any other Internet provider until the issue was resolved. We saw a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink's ISP service across the United States."
To put the error into context, Cloudflare further said that while the Internet normally sees about 1.5MBs – 2MBs of BGP updates every 15 minutes, the number of BGP updates spiked to more than 26MBs in that same time frame. The cause of that BGP instability, according to CenturyLink, was the misconfigured Flowspec.
(Click here for a larger image. Source: CenturyLink.)
Occurring early on a Sunday morning, the multi-hour outage was less disruptive than it could have been, despite taking down some of the largest Internet services in existence.
But as ThousandEyes points out, the incident was still "extremely unusual." The firm further noted in its blog how enterprises can protect themselves from such a disruption going forward:
"During the course of the incident, some traffic routed through service providers other than Level 3 [CenturyLink] was reaching services, but getting dropped by Level 3 on the reverse path. Keeping in mind asymmetric routing, if enterprises had not only revoked advertisements to Level 3 (which were ignored by the provider), but also stopped accepting route announcements from Level 3 and shut down peering, they could have reduced the impact on their traffic."
At 11:17 a.m. ET, CenturyLink's help team tweeted that all problems were resolved (although that was followed by several replies from customers saying that they were still down).
Last outage triggered FCC investigation
CenturyLink's last serious outage was in late December 2018, in a network disruption that lasted for 37 hours, taking down 9-1-1 emergency call centers across the country, including all of Washington state, as well as other local government services.
At the time, CenturyLink told Light Reading that the issue was "a faulty network management card from a third-party equipment vendor."
The network outage got the attention of FCC Chairman Ajit Pai and triggered an investigation:
In its report on the incident, issued in August 2019, the FCC recommended a series of best practices that it said "if implemented, could have prevented the outage." Those included turning off or disabling idle system features; having network monitoring memory and processor utilization alarms that are "regularly audited to ensure functionality and evaluated to improve early detection and calibration"; and "having standard operating procedures for network repair that address cases where normal networking monitoring procedures are inoperable or otherwise unavailable."
— Nicole Ferraro, contributing editor, Light Reading