On Tuesday, 18 November, a Cloudflare outage took a significant part of the Internet offline, including major sites, enterprise platforms and public-facing services.
Ironically, even Downdetector – the platform that provides real-time information about service outages – apparently went down for a time.
This wasn’t an isolated incident, either: an AWS (Amazon Web Services) outage about a month ago caused similar disruption to thousands of dependent services and was followed a few days later by a smaller Microsoft Azure outage.
If the largest Cloud providers can experience outages of this size, it’s no great stretch to suggest that all organisations would be wise to scrutinise their Cloud configuration and resilience controls.
This blog post explains how to test your Cloud configuration – and provides some background information about what happened at Cloudflare and the regulatory context for digital service providers.
What happened on Tuesday?
A failure within Cloudflare’s internal systems caused parts of its network to become unreachable. Organisations using Cloudflare for DNS, CDN distribution, Zero Trust access or application security saw their own services fail as a result.
Given Cloudflare’s position in front of a substantial share of global Internet traffic – some 20% of websites worldwide, it says – the disruption was widespread.
Both this and last month’s AWS and Azure outages highlight the same core problem: most organisations now run services that depend on long chains of Cloud components. And when one of those components fails, it can set off failures in systems that seem entirely unrelated.
Why this matters for your organisation’s Cloud set-up
There’s still a common assumption that moving to the Cloud guarantees cyber resilience.
It doesn’t.
Misconfigurations, unclear dependencies and limited operational visibility continue to expose organisations to risk.
In particular, these recent outages underline three recurring issues:
- Limited visibility of the control plane
Without clear insight into configuration changes and system behaviour, it is difficult to detect or respond to failures. - Redundancy that does not behave as expected
Provider-level failover may not help when the outage originates in a provider’s own control systems. - Dependencies outside your direct oversight
DNS services, identity platforms, API gateways and routing layers can involve multiple third parties, not all of which are obvious.
Regulatory context: the NIS Regulations and the new Cyber Security and Resilience Bill
As DSPs (digital service providers) under the UK’s NIS Regulations (Network and Information Security Regulations 2018), AWS, Azure and Cloudflare can likely expect regulatory scrutiny following these incidents.
(The Regulation requires DSPs to report “significant” or “substantial” incidents to the ICO (Information Commissioner’s Office) for investigation. Cloudflare itself called Tuesday’s incident a “significant outage”.)
Under the Regulations, non-compliant organisations face fines of up to £17 million.
However, these penalties are likely to increase soon: the UK government has now introduced the Cyber Security and Resilience Bill to Parliament, which proposes to increase the maximum penalty to £17 million or 4% of annual global turnover – whichever is greater.
The Bill’s aim is to raise national resilience standards and bring the UK into closer alignment with the EU’s NIS 2 Directive – the NIS Regulations having implemented the requirements of its predecessor, the first EU NIS Directive, pre-Brexit.
Reflecting the fact that Cloud services underpin significant parts of the UK’s digital economy, the Bill proposes:
- Expanded duties for operators of essential services and digital service providers.
- Shorter reporting timelines for cyber incidents and clearer escalation requirements.
- Stronger regulatory oversight of Cloud providers and other critical suppliers.
- More prescriptive expectations for resilience testing and supply-chain security.
Test your Cloud configuration before the next outage
Recent events show how quickly Cloud-based services can fail and how widely the effects can spread. They also show that resilience can’t be outsourced wholesale to a provider. Every organisation needs to understand how its own Cloud configuration will behave when something breaks.
Here’s a simple checklist:
- Do you have a clear map of your service dependencies?
- Have you tested failover paths for identity, DNS and network routing?
- Would you detect a partial control-plane failure quickly enough to act?
- Have you tested recovery from misconfiguration or degraded provider services?
If any of these questions raise doubt, a Cloud Configuration Penetration Test is a practical next step.
How our Cloud Configuration Penetration Test helps
Cloud configuration penetration testing assesses how your environment behaves under stress, tests whether controls perform as intended and reveals where resilience assumptions are misplaced. It includes:
- A detailed configuration review covering identity and access management, networking, logging, storage and other core control areas. We benchmark your settings against recognised best practice and assess whether they support secure and resilient operation.
- Targeted attack simulation designed to identify and exploit common Cloud misconfigurations and permission weaknesses, such as overly broad roles, insecure routing paths, unprotected services and information leakage. This shows how an attacker could move within your environment if a configuration slips out of alignment.
- Analysis of resilience and availability risks arising from these weaknesses. We examine how misconfigurations, insecure dependencies or insufficient logging could amplify the impact of upstream issues — for example, a DNS disruption, identity service degradation or unexpected control-plane behaviour. This helps you understand where failures could cascade and what to remediate before they do.
The test provides a clear, prioritised report that explains each finding, its impact and the steps required to fix it. The result is a Cloud configuration that is not only more secure but less likely to fail unpredictably when an external provider experiences issues.
Testing ensures your configuration is sound, rather than assumed to be sound. It helps confirm that your environment will behave predictably when upstream platforms fail. It strengthens your recovery posture by highlighting weak points before they become operational problems.
As an example, a DNS fault inside a provider can lead to traffic being misrouted or authentication processes failing silently. Configuration testing shows whether your systems would recover cleanly, degrade gracefully or fail outright.
Contact us to assess your Cloud configuration and improve your resilience before the next outage takes effect.
The post What AWS and Cloudflare Outages Teach Us About Cloud Configuration Risks appeared first on IT Governance Blog.
