Latest post

(just a geek, often in Soho)

When large is a liability

...thoughts resulting from the recent spate of major internet outages

Over the past few weeks, we've seen various outages occur across the global internet. Each of these outages has been at one of the larger companies trying to establish itself as a (if not the) dominant ecosystem on the net. The most recent being the Cloudflare outage on 18th November.

But it's perfectly possible to operate on the internet without using any of these ecosystems at all, and I would argue that this offers a distinct benefit. The Cloudflare outage took out Turnstile, which is a security feature used widely across the internet. But it could equally well have been one of the "captcha" systems also used for this purpose. They are all being supplied by the large players, with the possible exception of h-Captcha which can be fiddly to set up.

We operate our own infrastructure, our own DNS, our own perimeter scanning, our own mail servers, our own web servers, our own database servers, etc. None of these systems have been affected by any of the outages we've seen over the last few weeks. Interestingly, because of the way these products are sold, several of our clients assumed that we were dependent on systems that failed when, in fact, we were immune. I've received several "how have you managed to keep going?" emails from people who simply do not understand what's been happening.

The next time somebody suggests that you move your website to Webflow or one of the other platforms that's reliant on this sort of technology, think twice. Do you really need to be there? Is the minimal speed increase worth the risk? I would say not. A properly constructed system built like ours can deliver 99%+ page speeds on Google speed tests without difficulty.

But things do go wrong, so let's look at recovery times. Tuesday's Cloudflare outage took about 3.5 hours to restore services fully. This taken from their own blog, which details with remarkable honesty exactly what went on. Can we match that sort of recovery time as a small, independent supplier? I reckon we can. Worst-case scenario for us is the complete failure of a piece of hardware. We run an almost completely virtualized system. In the event of a hardware failure, replica containers can be brought up on alternative hardware within moments. Interestingly, we don't automate this process because testing experience shows that such automation may well bring more problems than it solves. So the first question is: Is there a human being available at the time that an incident occurs who can start immediate mitigation? With a comprehensive system of alerts, none of which depend on our infrastructure, we reckon we can get somebody on the case within moments.

We maintain backups in various formats. The first of which is a local on-site backup adjacent to the server it's backing up. In the event of simple hardware failure, it's a very quick matter to move some alternative systems into place and bring them up online, certainly far faster than 3 hours. Once back, resynchronization of all data is fast and efficient.

But suppose the entire data centre is offline for some reason, a denial of service attack or equivalent. Our secondary backups are maintained off-site and can be deployed anywhere. There will be an appreciable delay while this process occurs because restoring large amounts of data across the internet obviously takes far longer than if it's in the same shed. So our core systems are replicated in alternative locations ready to go live should this occur. Services will be restored even if sync is still running.

We're paying for this capability all the time on your behalf, and as yet, we've never had to use it. But one day it should avert a complete disaster.

Ironically perhaps, the reason we can make these sorts of claims is because we're small. If we were large, the sheer volume of data involved would make it almost impossible to do what we're outlining here. Small is beautiful.

So thank you, Amazon, Microsoft, and Cloudflare. But we'll plough our own furrow for a little bit longer and see how we go.


References:

Amazon (AWS) - 20 October 2025

Microsoft - 29 October 2025

Cloudflare - 18 November 2025