When considering a proactive approach to disaster recovery, it’s only natural to look to high availability solutions as a way to cope with system or data-center outages. However, although high availability solutions are a fantastic complement to any disaster recovery plan, substituting high availability solutions for disaster recovery can be fatal.
Highly Available Bad Data
High availability isn’t a replacement for a fully baked disaster recovery plan because high availability solutions just make data available—even if the data is disastrously bad. For example, consider an inventory-management solution atop a three-node failover cluster. This solutiont also uses synchronous database mirroring to keep remote copies of data synchronized in another location to protect against data-center failure. With system-level redundancy in the form of clustering, and data-level redundancy provided by mirroring, many organizations mistakenly assume that they've implemented a solution that protects them from disaster.
But imagine a scenario in which a software glitch requires a developer or DBA to manually set the inventory level of a certain, problematic product down to a quantity of zero. Now, suppose that during this dicey operation, the person executing this command forgets to add a WHERE clause to his or her UPDATE statement—and accidently sets the inventory level for all products to zero. High availability solutions can’t provide protection in a scenario like that. In fact, they just make the problem worse because they'll simply duplicate the bad data (or operation) out to multiple sites. The result would be that you now have copies of bad data in multiple locations. High availability solutions are also prone to these same duplication problems when it comes to sabotage by disgruntled employees and corruption by hackers.
Furthermore, if you've never had the opportunity to recover from a situation like this, it's easy to fall prey to the notion that you can just use the transaction log file to recover back to the point in time before the disaster occurred. However, such a simplistic solution rarely works with highly trafficked systems. In our example, in which inventory levels get set to zero for all products, this problem will quickly manifest itself when someone tries to place an order—because inventory will be unavailable. But assuming that you’re able to recover inventory levels to their pre-glitch levels, what about inventory additions entered into the system by the receiving department while you’re working on a recovery for the "zero-out"?
High availability systems only make your data available; they don’t control whether the data is correct. Therefore, if you don’t have a well rehearsed and regularly practiced disaster recovery plan (which can benefit tremendously from a third-party log reader agent) to cope with bad data, you might be left with highly available bad, or busted, data.
High availability solutions impose an additional layer of complexity. This complexity, in turn, requires additional management costs and considerations, and implies the possibility of increased system requirements and the potential for additional performance overhead. More important, with the addition of increased complexity comes the heightened risk of problems or disasters simply because there are more moving parts and dependencies in play.
In other words, increasing system or data availability is non-trivial and imposes risk, although the benefits of properly implementing a high availability solution far outweigh the risks. For example, failover clustering provides tremendous benefits in terms of system availability, but it's an expensive and non-trivial solution to implement.
Database mirroring is another non-trivial solution that provides tremendous benefits from an availability standpoint when implemented correctly. But to ensure correct implementation, hardware and network throughput need to be adequately sized to prevent a big SEND queue from degrading performance during peak operating times.
Consequently, with any high availability solution, make sure to size and plan correctly—and to consider how your disaster recovery plan needs to address any problems that might stem from your high availability solution causing performance problems or distracting you from your primary focus of keeping data safe.
A Chink in the SQL Azure Armor
Microsoft SQL Azure promises some exciting opportunities and potentials for organizations hoping either to benefit from decreased management costs or to (eventually) be able to scale without massive out-of-pocket costs. But although SQL Azure comes with a 99.9 percent up-time guarantee and service level agreements (SLAs) that ensure the protection of customer data in the case of SQL Azure hardware and system failures, there’s still a chink in the armor.
Because of the non-trivial tasks associated with decoupling the physical implementation details of how SQL Azure works from the underlying, physical storage engine, there's currently no available access for end users to the transaction log file. Therefore, end-users can't issue BACKUP or RESTORE commands. Microsoft ensures that SQL Azure data is protected from system failures, but there’s no way for SQL Azure subscribers to protect themselves from human errors or software glitches—serving as a perfect example of how a highly available solution can be susceptible to disaster recovery limitations.
When properly implemented, high availability solutions complement your data-protection efforts to facilitate effective disaster recovery plans. Don’t let the considerations outlined here deter you from being proactive and implementing a high availability solution. Just don’t assume that high availability solutions will be problem-free, and don’t mistake high availability solutions as a replacement or substitute for a well rehearsed, regularly updated, and fully documented disaster recovery plan.