Rebuilding trust in Automatic Updates

I have been, and continue to be, a massive champion of Automatic Updates in WordPress. I think it’s worth emphasising how big a difference they make to the security of your site. 

Plugins get hacked because they are out of date!

That is not to say a fully up-to-date plugin can’t be hacked, but an out-of-date plugin with an issue can and will be. 

There is also one group of people who WILL know about vulnerabilities and that is bad actors, regardless of how a vulnerability is announced, once in the wild exploits happen in a matter of hours. 

We no longer live in a world where a security patch could come out and you can wait till Patch Tuesdays to update. 

Unless you have a proactive security team, monitoring all plugins and themes across your site, or a particularly on-the-ball dev team, you are going to need to rely on automated updates.

Recently there has been great progress on the automatic update front with a new user interface for plugins and themes that allows you opt-in to automatic updates for individual plugins. Personally I would have liked to have seen it as opt-out, especially for plugins, but I can see why opt-in was chosen. I have a simple Shim plugin to make it opt-out which I put in mu-plugins; this enabled automatic updates by default for plugins.

Coming in WordPress 5.6, hopefully, will be an interface for opt-in to major version changes, something we could do via a define in wp-config.php previously. This will effectively be the last step and in theory a WordPress site could just automatically update itself without any issues.

So we have won, I’m out of my newly created job, after all we don’t need a WordPress Security Consultant when so many hacks are caused by out-of-date plugins, if we just keep them up-to-date.

Sadly no, I’m still going to be around.

The problem with automatic updates is humans.

You see, for automatic updates to work, we need to trust them and let them do the job, not second guess them, not decide we know best so pause them or start deciding we will only allow them for the less important things.

I’ve talked at length at the various ways to handle automatic updates and many of the reasons people give for why they don’t trust them along with why they should.

The problem is not with the system, it’s that we need humans to trust the system. Which means we need people to understand risk. 

If we update something 100 times and it fails once in the 100 times the failure rate then is 1% but humans will often perceive failure rate much higher; we oddly remember the fails and never the successes. We also oddly magically forget our own failures, our own mistakes. This type of bias means if you ask people how reliable automatic updates are they will give a much greater failure rate.

When I worked at a managed Host we saw automatic failures well under that 1% mark and these figures were skewed because we had several sites with repeated higher failures rates than others. Which leads to the question, why do some sites just seem to have a higher failure rate?

Humans. In nearly all cases the failure rate increased when humans meddled, as the host offered an opt-out system in virtually all cases the the increased rate occurred when one or more plugins had been opt-out so parity across plugins wasn’t there. In these scenarios either the human deemed a plugin to be risky or ran into an unexpected issue.

A very common scenario that caused failures would be to prevent WooCommerce from updating because it was “mission critical” but leave the payment gateway or similar, which relied on it, updating. Another fairly common scenario was pegging (leaving a plugin at a specific version) because the update was different from the current one or didn’t work with another plugin. This scenario, often meant to be temporary, then again suffers from parity drift in versions.

From time to time there are problems, nothing updates smoothly 100% of the time every time. When you manually update the same is true, I’m just saying I trust a computer to do basic computing moving 1s and 0s better then I trust you to.

Then we have the human error cascade, where a human makes a mistake and because of automatic updates the issue propagates. An example of this is the recent issue where a lot of WordPress sites automatically updated to an alpha version of WordPress 5.5. Details of why this happened can be found on this Make Post but a quick version is a slight snafu while packaging the release by human error was to blame. 

It wasn’t the automatic updates at fault here but the providers, and no amount of testing would have helped because even if you tested on your staging site 10 minutes earlier when you hit update on live you might have gotten the wrong version. Barring putting WP in your source control and using that version which brings its own issues and challenges this would have caught you up.

With all that said, and with me pleading with you that Automatic Updates are safe and better than you doing them, what can we do and what would I like to see implemented to provide additional confidence.

Delaying Updates

A simple setting allows a defined period of time before the system updates, lots of managed hosts have this option, it allows you to set your staging site to update on day 0 and your live on day 1 providing a 24 hour gap. 

Personally I don’t like this gap, but it provides peace of mind to some that they have the opportunity to identify issues. In reality very few people react to issues on staging for this to be useful. 

Implementing the feature would be high on my wishlist in core even if I don’t want it used, because if we can convince people to turn on updates with a delay we can then convince them to take away the delay.

Single Off Switch

This exists in a define, but having a single, turn off all updates, and a turn off all updates for a day, option within the User Interface will give folks confidence that they have a panic button if they need it. People with big red buttons don’t normally press them, but they feel more confident knowing they are there.

Rollback

Currently there is no easy rollback system, while rolling back code is easy, rolling back database migrations is the bane of developers and sysadmins everywhere. So this is not easy but a long term goal for the WordPress update system is a way to return to the original state of the plugin.

How to implement it? Well that is much harder, as at the moment the settings API doesn’t explicitly register what data is associated with the settings. Until we have some way to identify settings data beyond transactional (posts/orders/stuff) this will be a near impossible challenge.

I have in the past built rollback systems that rely that “settings” changes are strictly changes to wp_options which works well until you realise the number of edge cases.

Testing

We need rollback for the true future which is that we can start having automated tests run by WordPress itself. So every update runs that’s plugin/themes tests along with the site’s test, if something fails it rolls back.

Even without rollback and testing the system is so much better than a human doing updates and significantly more reliable. Getting a rollback and testing system into core is a bit of a pipe dream but that doesn’t mean you can’t build systems to do this in your own infrastructure.

Automatic Updates are the present and future

I don’t mind what does the automatic updates, WordPress, Jenkins, a cron with some WP-CLI commands as long as it’s not you. Accept it, we suck, let the computers take over. 

Even though I have a wishlist for the future, what we have now is a good solution for the majority of use cases, if you think you are in a “special use case” have a good long look and ask are you really better at moving 0 and 1s then a computer?