DHCP Failover in Windows Server 2012

Systems administrators often have questions about DHCP failover in Windows Server 2012, specifically about how the Auto State Switchover Interval timer and the Maximum Client Lead Time (MCLT) timer work together. At first glance, both timers appear to do the same job—and this idea is reinforced in the Microsoft literature.

According to the Microsoft Official Curriculum for Course 20412C ("Configuring Advanced Windows Server 2012 Services"):

"The administrator configures the MCLT parameter to determine the amount of time a DHCP server should wait when a partner becomes unavailable, before assuming control of the address range."

This sounds straightforward. If you configure the MCLT to 60 minutes, then 1 hour after the server loses contact with its partner, it will assume control of the DHCP scope. However, the Microsoft Official Curriculum for Course 2041C also says this about the Auto State Switchover Interval:

"A communication interrupted state occurs when a server loses contact with its partner. Because the server has no way of knowing what is causing the communication loss, it remains in the communication interrupted state until the administrator manually changes it to a partner down state. The administrator can also enable automatic transition to partner down state by configuring the auto state switchover interval."

In addition, the TechNet DHCP Failover Settings page suggests:

"When operating in PARTNER DOWN state, a server assumes that its failover partner is not operating. The server responds to all DHCP client requests that it receives."

So, if the Auto State Switchover Interval is set and a server loses contact with its partner, after the configured time period it will transition to the Partner Down state. Once in Partner Down state it will respond to all client Discover packets that it sees.

The Auto State Switchover Interval timer and the MCLT timer have been the subject of much debate among delegates in my classroom. At first glance, they seem to be performing a similar, if not the same, function; however, there must be a distinction in what they do—otherwise, why would they both exist?

An Example of DHCP Failover

Let's look at a scenario in which two servers have been configured to provide DHCP failover for a single scope. In our example, which Figure 1 shows, we use two DHCP servers (DHCP1 and DHCP 2) that have a single scope configured. The scope is configured in a hot standby relationship with DHCP1 as the active server and DHCP2 as the standby server. The MCLT and the Auto State Switchover Interval have been set to 5 minutes and 30 minutes, respectively.

Figure 1: DHCP Failover Configured for a Single Scope

As you probably know, the default lease duration is 8 days. In our example, if a client leases an address from DHCP1, then DHCP1 will inform DHCP2 about the client and the address that it leased. If DHCP2 takes over running the entire scope, it will know that the address has been leased out and not attempt to lease it to another client.

But what if just as our client leases its address for 8 days, DHCP1 fails and cannot share the new lease with DHCP2? In theory, if DHCP2 takes over the entire scope, it will presume that the address our client is using is still available for lease and that it can assign it to a new client. This is the problem that the MCLT is designed to prevent.

Our MCLT is set to 5 minutes. When DHCP1 is approached by Client 1 to lease it an address, instead of leasing an address for 8 days it leases an address to Client 1 for 5 minutes. DHCP1 then informs DHCP2 that Client 1 has leased an address. DHCP2 can add the details of Client 1 to its DHCP database, but crucially marks the lease as 8 days. After 5 minutes, Client 1 contacts DHCP1 again and receives the same address, but this time with an 8-day lease. Both DHCP1 and DHCP2 now know this address is leased to Client 1 for 8 days.

MCLT During DHCP1 Failure

Let's take a look at how MCLT works during a failure of DHCP1. DHCP1 is approached by Client 1 to lease it an address. DHCP1 leases an address for 5 minutes (based on the MCLT), but before DHCP1 can share this information with DCHP2, DHCP1 fails. DHCP1 and DHCP2 had been maintaining a persistent TCP connection with each other over port 647, but now that DHCP1 is no longer there, DHCP2 will transition the scope to Communication Interrupted state.

When Client 1 attempts to contact DHCP1 after 5 minutes, it will fail and subsequently send out a general message for any DHCP server to respond. This will be registered by DHCP2, which will recognize that Client 1 has an address that was provided by its failed partner using the MCLT. This will cause DHCP2 to lease Client 1 the same address for the full 8 days and add the details to its database even though it hasn't taken over running the entire scope yet.

Without the MCLT, Client 1 would have been given an address for 8 days. Client 1 wouldn't have tried to renew its lease, and when DHCP2 took over the entire scope, DHCP2 would believe the address was available for lease. This could result in a duplicate IP address being leased out.

Auto State Switchover Interval

Our example DHCP failover relationship has several states that it can be in:

Normal. This is the preferred state, with both members of the relationship servicing clients based on the failover mode they are in.
Communication Interrupted. If a communication issue occurs between two servers configured with DHCP failover, each server will transition to the Communication Interrupted state.
Partner Down. In the Partner Down state, a DHCP server assumes that the failover partner isn't operational and the running partner can take over running the entire scope.

Getting to the Partner Down state is accomplished in one of two ways—either manually by an administrator when he or she realizes that the active server is no longer there, or automatically after the period of time configured by the Auto State Switchover Interval.

The Auto State Switchover Interval's purpose is to automatically transition from the Communication Interrupted state to the Partner Down state. However, that's not the end of the story. If the Auto State Switchover Interval is set to 30 minutes, then after 30 minutes of noncommunication the partners will transition to Partner Down state but the running partner won't actually take over the entire scope until the MCLT has also run out. Therefore, using our timers (30 minutes and 5 minutes), it will actually be 35 minutes after going into the Communication Interrupted state before DHCP2 takes over running the entire scope.

*************************************************************************************

Mike Brown is a Microsoft Certified Trainer (MCT) and lead Windows Server instructor for Firebrand Training. When not teaching in the classroom or developing courseware, Mike creates a series of technical how-to guides for the Windows Server community.

*************************************************************************************

Comments

Plain text