Thanks to Hurricane Sandy and its angry little sister, Winter Storm Athena (since when did we start naming winter storms, too?), there’s been a lot of discussion about disaster recovery. Sadly, I think most companies still aren’t well prepared for disaster recovery. I chalk it up to human nature. We all tend to believe that bad things happen only to the other guy. On a personal level, how many of you have wills? How many of you have left all your account passwords in such a way that your spouse knows how to find them (in case you get hit by satellite debris on your way to work)? This same sense of denial exists on a corporate level for disaster recovery planning. With all that in mind, now is a good time to do a quick review of your Active Directory (AD) disaster recovery plan to see how disaster-ready it is.
AD is the IT pro’s best-known identity store. From a physical viewpoint, AD can stand up to a disaster very well indeed. AD is highly fault tolerant because it’s a distributed application with its identity store replicated across multiple domain controllers (DCs). It’s somewhat more vulnerable to corruption from a logical viewpoint, but physical disasters rarely affect the logical architecture and the data it contains.
You Can’t Wreck It if It Ain’t There
Of course, AD does have its vulnerable points, and if you design your forest poorly there are ways you can screw up its innate fault tolerance. The first and most obvious rule is to have more than one DC in your forest. To larger businesses this is a no-brainer, but it’s not nearly as obvious for small businesses (for example, if you’re running various versions of Small Business Server). If you do have more than one DC, kudos to you—but now we come to the second rule: Are those DCs in different locations? Separating DCs geographically is a well understood best practice among midsized and large businesses, but a small business probably doesn’t have a second location with a WAN link between the locations. In that case, you would separate them within your office, if possible. What do you do in the case of a disaster like Sandy, in which you have a little time to prepare? Shut down the DC that isn’t the Primary Domain Controller (PDC)/Relative ID (RID)/schema/infrastructure master, and take it home with you! Sure, it’s not the best security practice, but in this case isn’t it more important to keep the company (if you’ll pardon the expression) afloat?
Let’s assume you’ve been able to check off these first two rules as done. Do you have a multi-domain forest? If so, are all your root domain DCs in the same location? If they are, you lose points on this item because if you have a multi-domain forest and lose all those root DCs due to a building disaster, you’ve lost the forest. A good AD design and a smart financial planning strategy have this principle in common: Diversity is key to getting through a wide variety of uncertain conditions.
There’s Nothing to Restore if You Don’t Back It Up
Diversity is no substitute for backing up your forest, however. A good backup-and-recovery strategy ensures that if you do lose your forest or some part of it, you can rebuild it. Your backup-and-recovery plan shouldn’t stop at the DC recovery level; it needs to provide for the chance that you lose the whole forest. First, make sure you back up two DCs in every domain. In a small business, this isn’t a big deal; you probably have only one domain. Does it matter which DCs you back up? In general, it isn’t crucial. Because we’re focusing on serious disaster recovery, however, there are choices you can make that will speed your forest recovery time.
The forest recovery process, as detailed in TechNet’s “Planning for Active Directory Forest Recovery,” creates a seed forest of one DC for every domain. Because there’s only one DC in each recovered domain, that box must hold all the operations master roles for the domain. Your recovery process will be a bit simpler and faster if the DC you recover already has these roles installed on it, so back up the DC that holds these usually-grouped-together roles.
If your forest has multiple domains, another consideration is for this target backup DC to not be a Global Catalog (GC) server. Why not? Because differences in backup versions between the authoritative DC in each domain and its GC replica in other domains can introduce lingering objects into the recovered forest (see the “Removing the Global Catalog” section of “Appendix A: Forest Recovery Procedures” for details). Best practice is for all DCs to also be GC servers, so how do you reconcile this? Decide based on the size of your domain or forest and the number of DCs in a domain. If you have a multi-domain forest with more than 10,000 users, unhosting a GC will take time that you won’t want to spend during a forest recovery. Also, you’ll probably have enough DCs in a domain that one of them not hosting a GC won’t be a problem. If you have a relatively small number of users in a multi-domain forest, you need the GC role more, and unhosting a small GC from a seed DC doesn’t take as long, so this tip doesn’t apply.
Another choice you can make to hasten a forest recovery is to upgrade your DCs to Windows Server 2012. How does the new OS make this process easier? The high-level process is to build a seed forest of one DC per domain, then build out this recovered forest with additional DCs. As I mentioned in “ How Windows Server 2012 Improves Active Directory Disaster Recovery,” Server 2012 can speed up the build-out process tremendously by allowing you to simply clone the seed DCs you’ve created. If you don’t have Server 2012, you can still speed up your forest build-out by quickly creating new member server virtual machines (VMs)—but you must promote them as you would any new physical server rather than simply cloning them.
There’s a whole new discussion possible now about how the cloud affects AD backup and recovery. For example, should you host virtual DCs in an Infrastructure as a Service (IaaS) service? Can Microsoft’s new Windows Azure AD service help you? I’ll save these for a future column.
Important but Not Urgent
Recent weather events on the east coast should serve as a wake-up call for companies that haven’t taken the time to put together and test a solid AD disaster recovery plan. This activity falls into what the late Stephen Covey of The 7 Habits of Highly Effective People fame calls Quadrant II: Important but Not Urgent. It’s not a coincidence that he says Quadrant II is where your best contributions are made. Place a priority on this important but not urgent task. Otherwise you’ll find yourself in, well, deep water.