AWS Outage: A Supply-Chain Security Lesson

It sometimes seems that each new supply-chain security breach we see in the news affects more organisations than the last one. This isn’t particularly surprising when the same few tech companies support almost everything else.

So, when it comes to AWS (Amazon Web Services) – the world’s largest Cloud provider, which is relied on by something like a third of the Internet – an outage like Monday’s really does demonstrate the problem of concentrating so much Internet infrastructure in one place.


What happened and why it matters

  • AWS’s status dashboard reported that between 11:49 PM PDT on 19 October and 2:24 AM PDT on 20 October, the company experienced increased error rates for services in the US-EAST-1 region (northern Virginia).
  • The triggering fault was DNS-resolution issues affecting the DynamoDB API endpoints in that region, which cascaded into higher error rates and latencies across multiple services. (DNS – the Domain Name System – is the mechanism that links domain names to IP addresses.)
  • The impact was global: according to the BBC, the outage monitor Downdetector saw “more than four million reports of issues globally [on Monday morning] – more than double the 1.8m reports it sees on a full weekday normally”.
  • Financial, government and public-sector services were all hit, including UK banks, the Post Office and HMRC.

It might not have been a cyber attack but it still counts as an information security breach as it affects the third component of the cyber security CIA triad: availability.


NIS 2 and DORA

From a regulatory perspective, AWS, as a digital service provider, will be bound by the NIS Regulations (Network and Information Systems Regulations 2018) in the UK and various EU member state laws based on the NIS 2 Directive. In the EU, there’s also DORA – the Digital Operational Resilience Act – to contend with.

Andrew Pattison, our Head of GRC Consultancy Europe, explains:

“Commentary indicates that AWS worked out the problem reasonably quickly and fixed it, but 24 hours later not all services and global services are up and running at 100%. As we are all learning, resolving a fault quickly does not mean the impact is also resolved, as there are cascade effects and the ripples of the problem do not just stop.

“With this disruption, the first thing that came to mind – and this is very much from a European Union perspective – is that the regulatory environment has changed significantly. From October 2024, NIS 2 has been appearing in national laws and since the middle of January 2025, DORA has been fully enacted (it was already in place for the previous two years but not enforceable). So, with these two vehicles for enforcement in place, what is going to happen?

“Will we see action from competent authorities, particularly in relation to DORA? Is there going to be action by the ‘lead overseer’ (as detailed in DORA) to deal with the likes of AWS, as individual business do not have the leverage to do this?

If we were to see action in relation to DORA, this would be a real game changer and would bring home the very real implications of the legislation. In particular, it is going to change the relationship and requirements when using third parties to deliver key services. It also shows the vulnerability of delivering services that are Cloud-based; we have both resilience and vulnerability in the use of these services, and the complexity of the dependencies and interaction of services is going to cause a lot of organisations some serious compliance issues in the coming years.

Then, just for fun, let’s throw in AI and the regulations around that.

With this all going on, organisations need to better fund their governance, risk and compliance capabilities, as the requirements for this are only going in one direction.


What smaller organisations can do

Regardless of size, every organisation that uses third-party digital service providers should view this outage as a warning.

Here are some practical steps you can take:

1. Map your dependencies

  • Identify which Cloud/digital service providers you rely on (for example hosting, databases, authentication).
  • Document which services you are using (for example what availability zone, what load-balancer, what DNS setup) and where they are based.
  • Understand cascading dependencies (for example if you rely on a provider that itself uses third-party services.

2. Review contracts and SLAs (service level agreements)

  • Ensure your contract explicitly covers availability, incident recovery, escalation process, and data recovery.
  • Consider the right to audit or oversee the provider’s resilience arrangements.
  • Do not assume “100% uptime” – understand what the provider means by availability, how they measure it and what their recovery plans are.

3. Prepare resilience/fall-back strategies

  • Where feasible, build redundancy into your operational practices: use multiple providers for critical services or multiple regions.
  • Develop playbooks for what happens when a provider has an outage: how you will detect, respond and communicate.
  • Ensure you have monitoring and alerting that flags when critical services degrade (not just when they fail catastrophically).
  • Run table-top exercises that simulate provider failure to identify gaps in your response processes.

4. Incident response and recovery capability

  • Ensure your incident response plan includes the scenario of a provider outage (not just a cyber attack).
  • Make sure your BCP (business continuity plan) covers third-party failure, including how to maintain operations, how to communicate to stakeholders and how to preserve reputation.
  • After an outage, review what happened, update your lessons-learned register and adjust your documentation accordingly.

5. Talk to your provider(s)

  • Ask your provider about their resilience mechanisms, any failures they’ve experienced and how they handled them, and the metrics they use to measure them.
  • If you are a smaller organisation, make sure you are not purely dependent on a single region or single provider without mitigations.

6. Budget for resilience

  • Recognise that resilience doesn’t come for free. Extra costs (for example for multiple providers) must be budgeted for.
  • Also consider the cost of downtime vs the cost of mitigation. Even small organisations may incur heavy reputational or regulatory cost when a provider fails.


How IT Governance can help you

The AWS outage is a stark reminder that digital infrastructure is not infallible. Although Cloud services offer many advantages – scalability, flexibility and cost savings to name but three – they also introduce systemic dependencies and single-points of failure.

For organisations of all sizes, Monday’s AWS outage should be a wake-up call. Use the incident to review your resilience, third-party risk management processes, contracts, incident response plans and business continuity practices.

Ensure that when the next incident happens, you’re prepared not only to respond, but to recover swiftly, communicate clearly and meet your regulatory obligations.

And if you need any help with Cloud or supply-chain security, business continuity and operational resilience, we have everything you need.


The post AWS Outage: A Supply-Chain Security Lesson appeared first on IT Governance Blog.