The recent Amazon Web Services (AWS) outage and resulting widespread Internet disruption serves as a stark reminder for companies using cloud services, highlighting risks of single points of failure, the criticality of multi-region redundancy, and the need for thorough risk assessment in software-as-a-service (SaaS) adoption. Companies relying on cloud solutions should, therefore, take this moment to re-examine their technical safeguards and contractual protections.
What happened: Summary of this week’s AWS disruption
On the morning of October 20, 2025, AWS experienced a major outage in its US-EAST-1 region with immediate and far-reaching impact. This led to significant disruptions across major platforms, affecting financial services, educational institutions, healthcare systems, airlines, smart home devices, social media, and various other applications. While Amazon quickly identified and resolved the issue, the event demonstrates that even mature providers with robust systems and processes remain vulnerable to failures that are difficult to isolate in real time.
Redundancy: Architecting for the provider’s bad day
For customers, the outage highlighted several urgent issues: business operations were disrupted; multiple services and geographies were impacted; and resolution required provider-side fixes beyond the customer’s control. Therefore, customers must consider how to mitigate such risks.
One item to consider is whether single-provider or multi-provider redundancy is appropriate for a customer’s requirements. For cloud users, redundancy involves implementing backup systems and alternative resources to ensure service availability during failure. Within a single provider, regional distribution can reduce an outage radius. Cross-provider redundancy (i.e., the use of multiple vendors), however, can offer additional protection against provider-wide disruptions but may require the implementation of complex system segmentation.
SLAs: Ensuring credits are not your only remedy
Beyond technical safeguards, customers must ensure that contractual protections reflect operational realities. While service level agreements (SLA) are often framed as a mark of a provider’s confidence, standard SLA credits (typically a small percentage of monthly fees) rarely cover actual damages from multi-hour disruptions. Customers affected by this week's disruption undoubtedly faced lost revenue, diminished loyalty, decreased site traffic, and service inability to end-users. For platforms unable to process transactions, financial firms with inaccessible accounts, or educational institutions whose students couldn't submit assignments, damages surely far exceeded standard service credits.
Customers should carefully review how SLA credits are measured and applied. For example, if a contract treats SLA credits as the sole remedy, a customer may be limited to nominal credit despite severe operational harm. In addition to SLA credits, customers should consider whether it is necessary to preserve the right to pursue additional remedies for outages exceeding defined thresholds or resulting from provider negligence, repeated breaches, or failures to meet specific commitments (e.g., security, disaster recovery, data durability).
Disaster recovery, data protection, and change management
This week’s incident also highlights the need to align disaster recovery and data protection provisions with actual provider practices. Customers need to consider whether and how to address recovery point objective (RPO) and recovery time objective (RTO), data durability commitments, and geographic replication. Customers who rely on provider-managed backup and restore processes should understand the dependency chain, particularly if the backup/restore systems can be impaired by the same failures that affect production.
Shared responsibility and operational playbooks
Cloud resilience is a shared responsibility. Providers own infrastructure and managed-services resilience, while customers own application resilience and architectural choices. The right response to this week’s outage is not to abandon cloud services but to ensure adoption occurs with full awareness, proportionate risk allocation, and contractual controls.
Additionally, customers should consider preparing incident playbooks that assume provider-side failures. These playbooks should define escalation paths, internal and external communications, business continuity steps, and criteria for invoking failover.
Conclusion
Regardless of the financial and reputational consequences of this week’s outage, it serves as a reminder that even world-class providers that provide the core of the Internet’s backbone can suffer complex, multi-service disruptions that materially impact customers. Usage and reliance on cloud and SaaS-based services are not going to diminish, and, thus, the prudent response for customers is to harden both their systems and contracts. Use this incident to reassess redundancy, reevaluate SLAs, and reset expectations. Doing so now will position your organization to withstand the next provider-side event with less downtime, fewer surprises, and better recourse.