You backup, right?
Backups really shouldn’t be in a security series, they should be in a general life series. The problem with backups is that you don’t miss having them until the most critical moments. When disaster strikes, your backups will be there to be restored from and, instead of potentially weeks of downtime, you are back up and running in a few minutes.
Sometimes
The problem is, backups are often neglected, and without a good strategy for managing, testing and deploying backups they are almost as worthless as not taking backups. Many people have reached to restore their backup only to find the zip is, in fact, empty and their backups haven’t been working for months.
So what is a backup?
A backup is simply a copy of files stored somewhere away from your site which, when replaced, will restore your site. Normally you have two parts to a backup for a WordPress site and they may be stored separately. The first is your files, so things in your wp-content/uploads/ folder for example, and the second is a SQL file which is a copy of your database.
This means when you restore, you copy the files and they will overwrite existing files along with the SQL file which you import into the database allowing you to restore the contents within the database.
Who should backup?
Everyone should be taking backups, even if your host has the best backup system in the world, it is not your backup system, it is theirs. So it is important you keep your own backups for the myriad of scenarios where you might not be able to use your host’s.
If you are using a 3rd party service to perform backups on your behalf you should ensure that the eventual location for the backup is in a place you have access to and “own” rather than the third party service. In a similar way to a host, if you don’t have a copy of the data, then the third party has a backup rather than you having a backup.
So who should backup? You!
It’s great others are also doing it, but ultimately, if you want backups, you take backups.
What should we backup?
There are basically two approaches you can take to backing up, everything or selective, and they each have pros and cons.
Everything
In this scenario, your backup is a complete copy of all your files and a complete copy of your database.
Pros:
- You haven’t missed anything if you backup everything
- It’s simplistic and less prone to going wrong, or files being missed
Cons:
- It results in large backup file sizes
- It’s backing up a lot of content we will always have access to
- It potentially introduces new security risks
- Potentially slower to restore content
Selective
In this scenario, you backup content deemed “custom” so excluding things like the WordPress install itself or plugins on the WordPress repo. It might even only backup selected tables within the database.
Pros:
- Smaller file sizes
- Easier to find and restore content
Cons:
- More complicated to set up
- Potential for missing critical pieces
If you are not backing up at all, any backup is better than none. Selective backups tend to work well with setups that have some sort of deployment pipeline and control over what is being put on the server. If you use Git for custom code then you will be leaning more towards Selective backups, if you have clients who like to make changes themselves then the full backup is going to be more helpful.
Tip
If you are on Managed WordPress Hosting, chances are you don’t control the WordPress core files, they might, for example, be in a subdirectory. If you don’t directly manage a file/folder it probably makes sense to avoid backing it up.
Full vs Incremental backups
Once we have selected what we will be backing up the next step is to work out what type of backups we will be doing. Again, there are broadly two types of backups Full and Incremental.
Full backups
With Full backups we take an entire copy of all the content we intend to backup and store it all, regardless of date, within our backup.
Pro:
- We have everything
- Each backup is independent of other backups
- Simple to restore
Cons:
- Takes a longer time to backup & restore
- Takes up more space
Incremental backups
While we start with a Full backup, each subsequent backup stores files that have been changed or modified since the previous backup. Incremental backups can work both for files and databases but normally require specialist tooling and setup.
Pro:
- Smaller backups
- Can be very quick to do surgical restores when something went wrong with a single file
Cons:
- Can be complicated to set up
- Full restores can be complicated or require stepping through increments.
For most people, Full backups will be the normal process and are much simpler to implement as we are talking about a system which is largely going to be left on its own; simple is good. You can also use both systems side by side. A fairly common implementation of Incremental backups is to take a Daily Full backup and then an Hourly Incremental backup. This means for changes during the day you can restore to the Incremental, but for larger issues you can reach for your daily backup.
Picking the right combination of strategies really does depend on the site so here are some example of backup strategies within case studies:
Professional Blogger
A fairly typical WordPress site, large changes, for example new themes and installing plugins happen on an adhoc basis, they tend to happen in a cluster. However new content is regularly added on a daily basis. Content is a mixture of posts in the database and images uploaded to the wp-content/uploads folder.
They don’t have a local development environment and don’t use any deployment workflow, they just add plugins through the WordPress interface.
The most sensible combination of backups is Full backups of everything on a Daily basis. While this leads to the potential of losing today’s posts, we gain simplicity and a simple restore option.
WooCommerce Store
A WooCommerce-powered store working with a developer agency (or you working with a WooCommerce Store) taking at least a couple of orders an hour on average. Changes to code, are on the whole, version controlled.
There is a Git repo of custom code, and they are on Managed WordPress Hosting.
A good combination here is to use both Selective Full backup and Selective Incremental.
So taking a Daily Selective Full backup that includes:
- The database
- wp-content/uploads folder
- Any files not in version control that’s not managed by the hosting
This is taken daily and acts as the base for a restore. The restore process would then be:
spin up managed hosts default setup
restore our backup
deploy our code from our Git repository.
In addition, because we have critical orders, and if we restored the database we would potentially lose a day’s worth of orders. Having a Selective Incremental backup specifically for the database and any customer-generated files that might happen on an hourly basis, would mean our restore process is as before but once the daily is restored you then restore the Incremental backups.
TimNash.co.uk
For this site, I take a daily Selective Full backup.
As it sits on 34SP.com Managed Hosting I don’t need to include the contents of the /wp/ folder and as I have all my custom code in Git along with a list of plugins that come from wp.org also in Git. My Selective backup is:
- The database
- wp-content/uploads
- Post-Recieve.sh hook (My deployment hook for the 34SP.com Git deployment system)
My restore process should anything go wrong, is to spin up a new container, restore the database and wp-content/uploads and then deploy from Git which will auto-setup the plugins, composer.
Taking backups
Backup tools normally work on a push or pull setup. Local tools that run on the server generate the backup and then push them to their remote storage location. These can be as simple as a bash script on a cron job through to complex dedicated tooling. The alternative approach is to have a centralised backup solution that connects to the server and runs the backup before pulling it to its storage. If you are building a backup solution for lots of sites (if you are an agency, for example) having a centralised backup storage solution may make more sense than for individual sites.
Should you use WordPress plugins?
There are a lot of WordPress plugins out there for backups, so should you use them?
Probably not unless you are in a scenario where you can’t not. There are a few reasons not to use a WordPress plugin.
- Backing up is a pretty resource-intensive task, with lots of files; most web servers put restrictions on memory and execution times. Consequently, WordPress plugins have to come up with workarounds to get round these limits adding far more complexity.
- Being inside WordPress – if WordPress isn’t working for some reason, nor is your backup. This could be a plugin breaking the cron, meaning your site doesn’t get backed up, or not all the content being backed up.
- When the plugins are running they are affecting your site
- There are far better tools out there.
Now there will be times where you simply do not have access to SSH, are unable to run your own tooling and a WordPress plugin might be the only option left.
Likewise a few do offer WP-CLI options that might get around most of the limits mentioned, and finally there are plugins that are actually simply interfaces to solutions that remotely do much of the work.
For this article I’m going to avoid specifying good solutions, however when looking for a backup solution you should be looking for tooling that:
- Gives you control over timing
- Either puts the backup on a remote resource of your choice, or has a means to communicate to a third party to get the content
- Can be run as a one-off
- Has a way to access the files and restore easily
Incremental backup solutions for databases will be very dependent on your database, and table storage type. But modern versions of MariaDB support Incremental backups out of the box, as can PerconaDB.
When picking tooling try to keep it as simple as possible, if your backup process has 10-20 separate processes, all of which have to work in a chain, then it might be more complex than it needs to be. Ultimately even Incremental backups can, at their most basic level, be successfully set up with an rsync command and a Mariabackup command.
For this site my backup process is very simple, as doing a Selective Full backup is a bash script that:
- Checks diskspace to make sure it has enough space
- Uses wp-cli DB export to export the database
- Generates a zip containing my wp-content.uploads folder and subfolder, the DB backup and my post-recieve.sh file which lives outside of the httpdocs
- Checks the filesize of the zip
- Sends to Backblaze, using the tool rclone
- Zip removed (it’s generated in /tmp/ so worse case this fails it will be removed naturally)
- Log entry made
Storing backups
Once we have taken backups we need to put them somewhere. Somewhere is not the same place as our site.
Why is wp-content/uploads/mybackup/ a terrible place?
Most WordPress plugins default to storing backups in the uploads folder of your WordPress site and while they do at least try to obfuscate the files with a hash, they are leaving backups on a publicly accessible location. If the files were found, then there is a reasonable chance one of those files is your wp-config file, when was the last time you changed your DB password?
The second reason is funny if it wasn’t the bane of hosting support teams; quite a few backup plugins will back their own backups up. The result, each backup gets larger and larger. Within my day job at 34SP.com, managed diskspace alerts represent a reasonable amount of server generated notices where the clients’ usable space reaches 100%. Nearly always this is due to poorly configured, self-eating backup plugins.
The final reason is if we have a catastrophic issue (the sort we might want a backup for) be it disk failure, site being held to ransom or we accidentally deleted the domain within our hosting admin, then our backups also were lost in the catastrophic issue.
So where should you host your backup?
Cloud Storage is cheap and is often a cost effective location. On premises, local NAS is another option. A dedicated backup server a third. Basically anywhere but that server. If using a NAS or backup server, what’s backing that up?
3-2-1 still relevant?
Many techies and anyone who has worked in IT probably had the 3-2-1 backup rule drilled into them (and promptly ignored by all):
Keep at least three (3) copies of your data, and store two (2) backup copies on different storage media, with one (1) of them located offsite.
The rule was developed less for the web, more for corporate environments and comes from the day where floppy disks left in the sun would melt and hard drives had spinning parts. For the youngsters this sounds almost steampunk, I know.
A good implementation of this rule with website backups, is a combination of a NAS/backup server and then a third party service for potential deep storage.
Some third party suppliers will offer a complete 3-2-1 type solution offering both instant accessible backups and long term storage on tape.
At this point, for Europeans, it’s worth taking a moment to think about the data in those backups. Personal data and how you handle it doesn’t change because it’s no longer in your database. You possibly don’t want data sent to Antarctica for deep storage if in 28 days you need to remove it.
Indeed, how long should you keep backups?
Having worked alongside our support team at a hosting company, I am amazed by some requests, “Hi! Would you restore my site to how it was on the 3rd of March 3 years ago…”.
Needless to say it’s unlikely anywhere is keeping backups from 1000 days ago that is working on a daily backup cycle.
While it’s up to individual sites to make calls, you will inevitably find whatever retention policy you pick, the correct policy was current policy +1 day. A general rule of thumb is 28 days of daily backups is a good balance between data retention policies and practicality of running a business. You might want to limit liability and reduce that, or consider long term storage with personal data stripped out.
Storing Backups for TimNash.co.uk
For this site I do take a 3-2-1 approach. When I take a backup it is put on Backblaze B2, which is a cost-effective cloud storage provider that then syncs with my local NAS. To keep tabs on things I have a pair of scripts that run to manage backups; one checks that there has been a backup in the last 24 hours and the second runs to remove backups older than 28days as long as there are at least 28 backups in place.
Testing backups
So, you have backups and you are taking them regularly, good job. When did you last test them?
I often ask this question at user groups and the look of panic that comes across people’s faces as realisation dawns; they have never tested their backups. If you don’t test and check your backups you don’t have a backup, you have a prayer.
So what constitutes a backup?
Can I use the backup to restore functionality as part of my normal deployment, is probably the best definition. If you don’t have a deployment then your backup needs to be able to restore everything. If you have a deployment solution can you run that and your backup to restore the site?
Routinely testing your backups will allow you to know that the backups are good but also allow you to practice the restore process. The ideal scenario is automated testing of backups.
At the very least you should, as part of your backup process, be checking the file size of your backups to make sure it’s at a size you would expect. This simple test checks your backup has something.
Next up is trying to restore a site, if you work with a containerised solution, this could involve you spinning up a new container and restoring content, if you work in an agency that makes use of automated acceptance tests then running these tests against this container can act as validation.
Setting up automated testing of backups is time consuming but if you have existing acceptance and integration tests then this will provide the most robust solution for testing.
A more ad hoc solution is to routinely pull to your development or staging environment backups rather than directly from the site.
For TimNash.co.uk I don’t have a huge range of acceptance tests, however I do have a dev environment, so I simply pull the latest backup from my local NAS then pull the dev branch of my Git repo. Because my backup is Selective Full, but principally the database and individual content pages, it means my dev site always has the latest version of the site.
When working with database changes I have started using the wp-cfm plugin to version those changes so I can re-apply them. If I forget to generate db change sets between sessions, well that’s on me and for a solo development project that’s fine.
Backups, when to use them
Right at the start I suggested backups are more than security and in many ways backups have little to do with security; you should always have backups but there are only a few scenarios where they are potentially helpful.
If a site is compromised, restoring from a backup is rarely a good idea. Site compromises tend to go unnoticed for a period of time and when it is spotted it’s often not the first compromise but one of several backdoors. Too often I have seen sites re-compromised over and over again; each time they restore from the backup from a few days previous, sometimes they even patch the vulnerability that was the original exploit. However a backdoor remains in the backup and the site is more or less instantly re-compromised.
A general rule of thumb is – while the data is of use, a backup is of little help in a site compromise over a full rebuild and proper clean up.
However that doesn’t mean they don’t have any benefits in security. Over the last few years we have seen an ever-so-steady increase in ransomware, with sites being encrypted or simply replaced by ransomware. Backups are vital in this scenario not because you can simply restore, but you can restore, collect the data and perform a full site clean up. Having that data backed-up means you and no one else should be paying ransoms.
Securing Backups
Remember backups are copies of your site. If, for example, you have any form of compliance such as PCI-DSS the data is still within scope. More importantly, they are a very obvious, very juicy target for bad actors. Backup servers, like the entire process, are often left unpatched and outside of most monitoring, so backups can be an easier target than your site.
Things to consider
Making sure your backups are not stored where they have public access. This includes making sure things like Amazon S3 buckets are not public. You might wish to consider whether they should be encrypted again if you have data that you encrypt at rest in your database, when you export the data by default it is not in this state. Consider how you plan on encrypting and decrypting data. Many years ago I came across a well-executed backup solution, where the backups were encrypted, stored remotely. The system was brilliant except the private key was stored on the server and the server alone. When it went down with a drive failure that caused the RAID to totally fail, they went to use the backups only to realise they had no access. Don’t leave the only key on the server.
How to get started
If you take nothing from this then taking any backup is better than no backup, so start taking them.
- Next make sure you are not storing them with your site
- Next consider how you can test them, and make sure you fully know how to restore from them
- Next make sure your backup routine runs regularly and is fully automated
- Finally make sure your backups are not exposed to the outside world
Once you get started then go back through the article for the various strategies and ideas you can start implementing.
Remember it’s good to have backups, but until it’s tested it is not a backup it’s a prayer.
Want to learn more?
This post is from a series called Back to Basics, here is the complete series so far:
Help others find this post:
This post was written by Me, Tim Nash I write and talk about WordPress, Security & Performance.
If you enjoyed it, please do share it!