On SQL Servers where Availability Groups (or Mirroring) isn’t in play, I typically recommend keeping a combination of on-box backups along with copying said backups off-box as well. Obviously, keeping databases AND backups on the SAME server is the metaphorical equivalent of putting all of your eggs in one basket – and therefore something you should avoid like the plague. But, while it makes perfect sense from a disaster-protection standpoint to make sure you have redundant copies of your backups, I still argue that for ‘smaller’ emergencies and problems, the best place to have your most recent backups is on-box – or on the same server where your databases reside. Consequently, keeping COPIES of all of your backups on-box means that you don’t have to pull data “over the wire” if/when responding to a “minor” disaster or other emergency that requires you to quickly dash-in and restore a production database. Again, though, as beneficial as it is to have “on box” backups, you simply can’t keep your only backups on the same box – hence my recommendation to ‘mirror’ backups off-box as well. (And, another nice “trick” here is that I find that it’s usually very easy to keep 1 – 3 days’ worth of FULL + DIFF + TLOG backups on-box for databases in most environments, but it’s also typically fairly easy to keep backups for as long as a week or more “off-box” where you’re, ideally, storing backups on cheaper, less-performant, disks).
AlwaysOn Availability Groups and Backups
As I called out in my previous post, however, the introduction of SQL Server AlwaysOn Availability Groups changes thins a bit – by introducing the potential for “fragmentation”. Or, in other words, if you don’t take the time to PLAN how your backups can/will be handled, you can end up with your backups “spread out” over multiple hosts. Technically, there’s nothing wrong with that – those backups will still work. Personally, though, I’d much rather avoid the potential of having to “hunt around” and find my T-Log and other backups if/when responding to an emergency that would require restoring from backups (especially since that’ll already mean some additional overhead and time involved in tearing down my Availability Groups – or at least moving my target DB out of an existing Group – and so on).
To put this into better perspective, assume that I’ve set up 2 servers with identical drive configurations, deployed a SQL Server database to them, and then ‘dropped’ that database into an Availability Group – and have just configured backups to target the Primary Replica (because I don’t want or need a scale-out license to run backups from the secondary). Then assume I have a “Tuesday from hell” – where at 2AM my automated backups kick off a FULL backup – stored on SERVER1. Assume further, that T-LOG backups then kick off every 10 minutes – and, as before, are stored on SERVER1 as well. At 8AM the system starts logging lots of traffic and by 9:30AM there’s a weird hiccup somewhere and the Availability Group correctly fails over to SERVER2 – or Server2 becomes the Primary Replica. Further suppose that Transaction log backups keep working as expected every 10 minutes – meaning that I’ve now got a FULL backup on SERVER1 and a bunch of T-Log backups on SERVER1 as well, but I’m now writing T-Log backups to SERVER2 as well. At 10:25AM something else happens and operations are shunted back over to SERVER1. (We’ll assume it was a manual intervention to push operations back on to SERVER1, though this could have just been weird “hiccups” or any of a number of other causes.) From here on out most of my T-LOG backups are on SERVER1 – but I still have a handful on SERVER2.
Now assume that something truly ugly happens around 2PM and I’ve decided I need to restore the database (or use the T-Logs to recover from whatever has happened – more on this in a second). At this point, I’ve now ‘fragmented’ my backups and will have to ‘collect’ them from two different machines, two different folders, and then try to restore them – in order, without screwing things up.
There’s an Easy Fix
Happily it almost takes more effort to describe what kinds of problems you can run into if you don’t take the time to prepare for this potentiality than it does to protect against it. Which is to say that there’s an easy fix that’ll prevent this problem.
To protect against this problem, you simply need to write all of your backups out to a ‘neutral’ location – or a server that isn’t one of you Availability Group Hosts. This can be done by simply writing backups to network shares – so that no matter which of your Availability Group hosts is currently creating backups, they’ll all end up in the same location.
Furthermore, if you want or need to keep COPIES of your backups at this point, then the MIRROR TO clause of the BACKUP command is a fantastic option. (In my experience many DBAs either don’t know about this option or forget about – simply because it’s ‘Enterprise Edition only’, but since Availability Groups require Enterprise Edition, you might as well use this option to get your “money’s worth”.)
Backup Fragmentation and Third Party Log Reader Agents
In this post I’ve painted a picture of some of the hassles you’d have to address if you needed to restore an Availability Group database without taking any account of where your backups are.
The problem though, is that restoring AG databases is a MAJOR pain. That’s not to say you shouldn’t have backups. But restoring AG databases can be terribly tedious and can result in serious down-time – something you’ll want to avoid like crazy given that you’ve already paid top dollar to keep the database in question as ‘up’ and available as possible.
As such, I still maintain that if you have deployed SQL Server Availability Groups, you’re dumb to not pay an additional $1500 or so to get a third party log reader agent – as they can totally ‘save your bacon’ in an emergency.
Of course, if you’ve got all of your T-Log backups in the same, consolidated location, then you’ll also find that using a Log Reader will be that much easier as well. So, in short, a tiny bit of prevention can go a very long way when it comes to preventing “backup fragmentation”.
Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3
Part 11: A Fork in the Road
Part 12: Issues with Enabling and Disabling Batch Jobs
Part 13: Creating and Populating a Batch Job State Table
Part 14: Establishing Synchronization Checks
Part 15: Setting Up Linked Servers
Part 16: Job Synchronization Checkup Logic
Part 17: Creating Jobs to Check on Synchronization
Part 18: Health Checks for Availability Groups
Part 19: Availability Group Database Backups