Amazon Web Services (AWS), the cloud-computing arm of Amazon, suffered a major service outage on Monday morning, October 20 (Eastern Time), disrupting vast portions of the internet. The incident was concentrated in US-EAST-1 (Northern Virginia)—AWS’s most critical and widely used region—paralyzing websites, applications, and gaming services across the globe. For several hours, it was as if “half the internet” had gone offline.
The event underscored the growing systemic risk of global internet infrastructure being overly dependent on just a handful of mega cloud providers.
According to AWS’s Service Health Dashboard, Amazon began investigating at around 3:11 a.m. ET, after detecting rising error rates and latency across multiple AWS services within the US-EAST-1 region.
By 5:01 a.m., AWS identified the root cause: a DNS resolution failure affecting its core NoSQL database service, DynamoDB—a platform central to how customers store critical operational data.
Mike Chapple, professor of IT, Analytics, and Operations at the University of Notre Dame, offered CNN a striking analogy:
“Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data. It’s as if large portions of the internet suffered temporary amnesia.”
Although AWS claimed at 6:35 a.m. that the DNS issue had been fully mitigated and that “most AWS services have resumed normal operation,” the knock-on effects had already begun to ripple outward.
The disruption quickly spread to EC2 (Elastic Compute Cloud)—the backbone of many companies’ online applications. By 8:48 a.m., AWS acknowledged ongoing issues launching new EC2 instances in US-EAST-1, advising customers not to bind deployments to specific Availability Zones (AZs) so that EC2 could “dynamically allocate” workloads to healthier data centers.
At 9:42 a.m., AWS updated its status to report that, despite multiple mitigation measures across several AZs, elevated error rates persisted when launching new instances. To stabilize the system, AWS implemented rate limiting on new instance creation.
By 10:14 a.m., the company admitted that numerous services within US-EAST-1 were still experiencing API errors and connection issues. Even after the root cause was addressed, AWS faced a substantial backlog of pending requests, with full recovery expected to take considerable time.
Because so many enterprises rely on US-EAST-1 as the core region for deployment, the outage cascaded globally.
Data from Down Detector showed a dramatic spike in service failure reports during the same timeframe. Beyond Amazon’s own platforms, disruptions were reported across banks, airlines, Disney+, Snapchat, Reddit, Lyft, Apple Music, Pinterest, and popular games like Fortnite and Roblox, as well as media outlets such as The New York Times.
AWS’s appeal lies in its powerful and flexible infrastructure—scalable compute resources, elastic capacity during traffic surges, and a worldwide data-center footprint. As of mid-2025, AWS commanded roughly 30% of the global cloud infrastructure market.
Yet this incident serves as a stark reminder: when the backbone of the modern internet rests so heavily on a few providers—AWS, Azure, and Google Cloud Platform (GCP)—a failure in even one critical region can unleash a cascade of disruption with incalculable consequences.