In previous posts we looked at how to enable regular checks for corruption and how to set up alerts for IO subsystem problems when they occur. But, as covered previously, one of the key means for properly dealing with corruption is to detect it early – so that you have more options for correcting it at your disposal that you would if it’s allowed to go unnoticed for long periods of time.
Consequently, in this post we’ll take a peek at how you can use backups as an additional means of detecting corruption.
Bonus Points: Leveraging Backups in the War on Corruption
Simply stated: you need backups. While Mirroring and AlwaysOn do provide cool features that will allow you to recover from some forms of corruption auto-magically, the reality is that backups are still your best line of defense against certain forms of disaster. Consequently, since you need backups, there’s no reason you couldn’t, or shouldn’t, leverage them in the war on corruption – as a way to more quickly discover physical corruption after it happens.
So, given that it’s already assumed that you’ve got regular FULL/DIFFERENTIAL backups plus regular transaction log backups on hand to meet your target RPOs and RTOs (and assuming that you’re regularly testing those backups to make sure that they’re viable), the fact that backups – by their very nature – are coming into direct contact with data that has been modified means that they’re a great, additional, means of detecting corruption that might have occurred or gone unnoticed during write operations.
Enabling CHECKSUMs as part of your Backup Process
When it comes to using backups as a means of detecting corruption, the key thing to note about backups is that they’re designed to copy pages (or blocks of data) from one file location to another – as a means of creating a duplicate copy (or backup) of SQL Server data. Accordingly, whenever a FULL backup kicks off, IF it could be directed to CHECK all pages and make sure they were valid, that would be a great, additional, way to ensure that corruption hadn’t occurred at the page level when data had been written out. Likewise, as DIFFERENTIAL backups ONLY copy or backup data that has been modified since the last FULL backup, DIFFERENTIAL backups would, in turn, be a great way to check and see IF any data had been incorrectly saved in the process of being modified.
And, fortunately, SQL Server provides the ability to perform CHECKSUM operations against data as its being backed up. To do this, you just need to ‘modify’ any existing backups you’re already taking to make use of the WITH CHECKSUM option that is provided by SQL Server.
It’s also possible to use the CHECKSUM option when backing up transaction logs – just as a means of ensuring that data is actually written to transaction log backups correctly as well. And, by combining CHECKSUM operations with the RESTORE VERIFYONLY command, you can then explicitly tell SQL Server to both check each page as it’s being backed up or copied to disk AND telling it to go back through when the operation is complete and make sure that all data was correctly written to disk – a powerful means of helping ensure backup quality.
It’s also worth noting that if you’re using 3rd party backup solutions, it’s also usually possible to specify the WITH CHECKSUM option along with whatever options you’ve chosen in terms of compression or encryption. Though, in my experience too few 3rd party backup solutions provide the option to add the WITH CHECKSUM clause into the mix directly from the GUI. Instead, what I usually find is that you’ll have to go in and modify whatever routines or scripts your 3rd party GUI is creating. (A bit of a bummer, obviously – but totally worth the effort in the vast majority of cases.)
Backups + CHECKSUM ARE Not a Replacement for Regular Corruption Checks
Finally, as great as the WITH CHECKSUM option is with your backups, CHECKSUMs + BACKUPS are NOT a substitute for regularly scheduled checks. Instead, using CHECKSUM’d backups merely serve as an additional, potential, option for early detection. Likewise, IF an error is found during execution of backups, you’ll want to pay attention to HOW SQL Server addresses that by looking at the corresponding options for STOP_ON_ERROR or CONTINUE_AFTER_ERROR to see what makes best sense for you.
Part VIII: The Importance of Backups