On October 4, 2021 at 15:39 UTC, the unimaginable happened. Facebook, Instagram and Whatsapp all went down simultaneously. While we were all thinking about how we could interact with other people, Facebook’s servers were in an emergency state.
Facebook seemed to have disappeared from the Internet to normal users like us. Users were unable to access their favorite social media platform on their smart devices when they tried to access it. It was almost as if someone had pulled the wires out of their data centers and cut them off from the Internet. They were offline for seven hours. This is an extraordinary event for a company as large as Facebook.
According to a Facebook blog post and a CloudFare write-up, the downtime was caused due to a configuration error in the backbone of Facebook routers that send and receive data over networks. It caused communication to stop between data centers and all their services to be disrupted.
Facebook doesn’t advertise its presence so ISPs and other networks couldn’t locate it. This made it unavailable. The paths to Facebook’s DNS prefixes weren’t being announced. This indicated that Facebook’s DNS servers were at best temporarily down.
Let’s now discuss exactly what happened.
Facebook, a well-known social media platform, relies on advertising to attract users. To do this, the Internet uses Border Gateway Protocol (BGP). BGP connects the Internet which is essentially a network made up of many networks.
What is BGP?
Border Gateway Protocol (BGP), a gateway protocol, allows Autonomous Systems on the Internet (AS) to communicate routing information. Networks interact and require communication. This is achieved by peering. BGP makes peering possible. Without BGP, Internet routers wouldn’t be able to function and the Internet would cease being an existence.
Let’s take a look at how it works. BGP is a routing protocol that allows all the Autonomous Systems under a company’s control, to communicate with each other or to know how to communicate. Companies must announce their ASN in order to do this. This is called a BGP route announcement. Facebook has its Autonomous System, and other sites have theirs. Each one is its own system and forms the Internet’s backbone.
The BGP is there to help us move from one point in the Internet to another. Every computer or device connected to the Internet has an AS. Each network is assigned an Autonomous System Number (or ASN). Companies must reveal where their Autonomous Systems are peering with other Autonomous Systems. This is how the Internet’s resilience should work. However, we all know that great power comes with great risk. If you turn on the router, you can suddenly point to nothing. You can also announce all the wrong things simultaneously, possibly because automation tools didn’t validate these route changes. It caused a massive problem on Facebook and nobody could access the tools to fix it.
BGP is, in simple terms, a map of all available services on the Internet that helps networks find the best route or path to them. It’s like a map that your computer, phone, or web browser uses when you type the URL of Facebook.com.
The error message that appeared on Facebook.com throughout this day stated, “Sorry. Something went wrong, and we’re working to fix it.” This message suggested that a Domain Name System was (DNS).