AlwaysOn Availability Groups and Third Party Log Readers

AlwaysOn Availability Groups (AGs) are a complex technology.

However, one of the biggest benefits that AGs provide are both high availability (or fault-tolerance) and disaster recovery (or the ability to keep data synchronized offsite for smoke and rubble contingencies) in a single package. Stated differently, when DBAs and SysAdmins learn the ins-and-outs of this single technology (which isn’t quite as trivial as it sounds), they’re then able to address high availability (HA) and disaster recovery (DR) concerns from a single interface or set of tooling, thus providing better scalability of management.

There is a big caveat though—which is that AlwaysOn Availability Groups can’t protect against disasters caused by users or rogue applications. Or, in other words, the HA and DR components afforded by AGs are made possible by mirroring data to multiple servers (thereby providing intra-data center protection via HA/fault-tolerance and extra data center protection by mirroring data off-site).

The assumption, though, is that the data being mirrored is always correct. If a software glitch or an end-user, or a DBA, or developer, or whatever happens to accidentally either destroy data or bust it, then that data is still highly available and fault-tolerant, but wrong. In most organizations, that should represent a disaster (who cares how fast and 'up' your data is if it’s wrong).

In situations like this, AlwaysOn Availability Groups won’t be able to help you. (No HA solution really will.) Instead, you’re going to need a backup.

The Problem with Backups and AlwaysOn Availability Groups

But there’s a problem with backups when it comes to AlwaysOn Availability Groups—which is that you can’t restore a backup over the top of a database that’s part of an Availability Group. Instead, you have to either tear down the Availability Group or remove the database from the Group (there are arguments or reasons for why one or either of these options MIGHT make sense) and then, once the database is no longer being mirrored, THEN you can restore over the top of it.

There are two huge issues with such an approach though:

First, and most obvious, is that either tearing down an AG or removing a database from it, restoring it, and then rebuilding everything back as it was before the disaster, is going to be tedious, time consuming, and something that’s potentially prone to errors. It’s also not going to be a win in terms of overall uptime/availability.

Second, and obvious to anyone who’s ever been asked to restore data to the way it was before "that idiot in development destroyed all of our customer data by running an UPDATE statement without a WHERE clause," is that while you can restore data to the point it was at BEFORE a user error destroyed it, you’ll run into a few subsequent problems. Meaning that you’ll be stuck with a choice: either you revert the ENTIRE database back to the way it was before data was destroyed—and lose any subsequent changes/modifications to ALL other tables and to the table in question, OR you’ll have to stand-up a COPY of the database you’re trying to recover, bring it to the point-in-time before data was destroyed, and then MANUALLY create UPDATE … FROM statements that’ll pull in data from the restored copy of your database and overwrite the ‘destroyed’ data.

This is insanely tedious, very error prone, and STILL means you’ll end up losing any changes to the data AFTER the data was destroyed. (Imagine a table used to keep track of physical inventory—where ALL products accidentally, somehow, get set to InventoryCount = 0—if you recover to the levels/values that inventory levels were BEFORE this accidental overwrite took place, you’ve reverted to where things were before the disaster; but have any products been added/removed from inventory while you were fighting with the disaster? If so, then your inventory levels are still inaccurate).

A key benefit of the second approach, however, is that the approach of standing-up a copy of your database next to mirrored/AG database means you won’t have to pull it out of an Availability Group or stop mirroring it. (Note, too, that for both scenarios listed above, I just assume that you have FULL/FULL+DIFF and Transaction Log backups on hand. If you don’t, then you need to be fired.)

Using a Third Party Log Reader Agent

An easier way of recovering from these kinds of user error disasters is, of course, to NOT enable them in the first place (i.e., don’t give users the kind of access to production systems that would even let them run UPDATE or DELETE statements and so on). In a few environments I’ve seen, however, that’s much easier said than done. And, in all environments, there’s still the potential that a software glitch or bug might end up maiming some of your data under certain scenarios. Which, in turn, means you should always have some sort of contingency plan to address these kinds of problems.

And, to that end, I can’t recommend Log Reader Agents enough. In overly-simplified terms, Log Reader Agents take advantage of the fact that the Transaction Log for SQL Server databases keeps tabs on the BEFORE and AFTER values of what data should look like (before and after) every, single, change to your database. As such, by reading the transaction log, Log Reader Agents can help a DBA or SysAdmin find particular operations or changes and then generate scripts that will, effectively, roll out or undo the changes in question—by simply creating UPDATE statements that’ll set data to what it was PRIOR to the accident in question.

Better yet, these same tools can also scan the logs looking for any subsequent changes to the data/records in question, and roll forward any subsequent (non-bad/non-accidental) changes as well. So, in the case of inventory levels all being accidentally set to zero, a Log Reader Agent can both reset those values to what they were PRIOR to an accident AND make sure that individual UPDATE statements that might have incremented or decremented inventory levels of particular products also get re-run or don’t get lost.

Given that AlwaysOn Availability Groups are an Enterprise Edition feature, it therefore stands to reason that the $700 - $1,600 that a Third Party Log Reader agent will cost is, effectively, just a drop in a bucket.

Moreover, given that one of these tools can "save your bacon" in a disaster—while still letting you keep your databases/AGs up and running as you pour over the logs and find/create a fix—and it should go without saying that a Third Party Log Reader Agent should be in the toolbox of every DBA managing AlwaysOn Availability Groups (the cost of not having a solution like this on hand is just too high otherwise).

Third Party Log Reader Agents:

ApexSQL Log - My hands-down favorite. This tool is solely focused on being a log reader agent—and does a fantastic job of it.
Toad for SQL Server – Toad for SQL Server actually bundles a log reader agent—not a bad add-on at all to a product that many DBAs love and use already.

Comments

Plain text