Humans Get Hit by Buses by Tim Nash

“Manual updating with prior testing is still the most reliable way to avoid issues.”

It’s a statement that I have heard more than once, and on the face of it, it has a ring of truth.

Of course, if you stop and think about it for a minute, it doesn’t make much sense at all.

In fact, automated updating, with multistage testing, backups, and alerting, is still the most reliable way to avoid issues.

How do we know?

Because the same folks telling you to manually update will tell you not to update on a Friday in case it breaks. It’s just as well that bad actors don’t work weekends.

I don’t want to dump too much on humanity, but humans are terrible at processes. We suck at them. Even worse, even when it’s our job to update hundreds of sites, and we are good at it, stuff gets missed. Stuff is ignored; they think it will be ok. Stuff gets delayed; humans need to eat, fart, go on holiday. Worse, humans are fragile; they get hit by buses.

Do you know what is good at processes? What is immune to the need to pee and then lose their spot in the spreadsheet?

Computers.

We know we must update, yet it’s very common for agencies, in particular, to advocate for manual updates. The argument is that because an update might fail, a human should be present to prevent it happening or to take action to fix it. I might add cynically, it’s more profitable for them to be paid to update sites when manual updating is in their maintenance contracts.

Well, I think there is a world where automatic updates and maintenance contracts exist together. Indeed, I think this is the ideal business model for any agency that offers such services.

Typical Update Cycle

If we take a typical process common in many agencies with maintenance contracts:

On a day of the week (or month :/) the company goes through each site and:

Updates the staging site with all the updates it needs.
Runs through a set of tests, visits the home page, maybe goes through the checkout page, logs into the admin area, etc.
If everything is ok, move to the live site and repeat.
Move on to the next site.

Your process might include backups before you start, and you might have lots of tests or none (it’s ok to admit it, you are not alone).

What are the problems with this approach:

It’s not feasible to do this multiple times a day.
Humans don’t always follow processes.
We have to do this during working hours or have out-of-hours support.
When things go wrong, it often takes time to fix.

Mistakes are costly. But worse, a vulnerability may remain exposed between patching cycles.

We are racing bad actors, and folks who patch only on a monthly or weekly basis will lose eventually.

Automatic update cycle

Let’s now reimagine this process, and specifically the role a maintenance contractor might play:

We run a series of automated tests against the live sites. We can use our integration/acceptance tests if we have them, or we can develop them specifically.
We take a backup of our live site.
We have our testing server (it’s no longer the staging server but a server purely for testing) that takes the backup and puts it on this site.
We run exactly the same tests as before (good news we just tested the backups, we can now genuinely call it a backup).
We check for updates.
We run the updates on the test site.
We run the tests again, if they fail, we generate a failure report and pause at this point. If they pass, we move on.
We run the updates on the live site.
We run the tests again. If they pass, all good, and if they fail, we generate a report and use the backup to restore.

This seems like a lot of work, but once it’s set up, it’s fairly low maintenance. And companies offering maintenance contracts are perfectly placed for it without discouraging automatic updates.

It’s their expertise that will be able to write the tests, it will be their team who gets the very occasional fail report and who manages the system. For the client, this is a seamless process. The testing server can be in the Cloud or on a box in the corner of the office.

Truely scale updates for thousands of clients

This process scales for 1-10k sites; though it might need more than a box in the corner for the latter.

People don’t pay for someone to push a button.

They pay someone to keep their sites up-to-date and secure. They don’t really care whether it’s automated or not.

Meanwhile, we care because automation is more reliable, more consistent, doesn’t sleep, and frees up teams to look forward, not tread water.

WordPress maintenance companies should be leading the charge on automatic updates because out-of-the-box, they are not perfect. If you run a company that provides this type of support, this is an opportunity for you.

However, even if you just switch them on and forget, you are almost never going to have an issue. I can say this, having enforced automatic updates on thousands of sites. The failure rate was incredibly low, under 0.2%.

So even if you don’t want to go with the above, and you just run automatic updates, the rollback plugin, and have some basic tests, the worst case is that once in a very blue moon, you might need to fix something in the morning.

Unless it is the most mission-critical part of a company, a couple hours of downtime is not the end of the world and is significantly better than cleaning up a hacked site. If it’s a critical part, then its updates shouldn’t be handled by humans, we’re just not reliable enough.

So please stop pressing the update button. 🙁

Humans Get Hit by Buses

Embrace Automatic Updates

Typical Update Cycle

Automatic update cycle

Truely scale updates for thousands of clients

WordPress Security Consulting Services