There are times when rebooting a Skype Enterprise pool that you should know how many servers to reboot at a single time. The answer is, “It depends…” Seriously that is another discussion we can have on how many front-end servers you can have and what the pros and cons are based on the number you have.
The short answer is only reboot a single server at a time for you don’t know which servers host which routing groups. There are plenty of discussion around not rebooting too many servers at the same time due to pool quorum, but what about routing group quorum? We are familiar with the Get-CsPoolUpgradereadiness cmdlet. What that tells us is which FE servers we can reboot at a particular time. Now typically we would reboot a single server at a time. Wait for that server to become online again (confirmation that the rtcsrv service running) before we move on the next server to reboot or patch.
So what happens when you have 10 FE servers in a pool. That is a considerable amount of time to wait before you can move on to the next front-end server to do a reboot. Let’s just say the boxes are physical and from the time you reboot a server and from the time it takes for the services to come back online we may be looking at 15 mins per box. And if you cannot move on to the next box before the previous one is up we could be talking about 75 mins for five boxes, or 150 mins almost 2 hrs. just rebooting boxes.
So there is another approach and that is doing the Invoke-CsComputerFailOver -ComputerName "FE1.contoso.com" -WaitTime 0:30:00
What this cmdlet will do is gracefully failover the services from one node to another. More importantly are the routing groups. Yes, so this node would failover the routing groups from that node to another node. So now the Routing Groups are taking care of and will not be offline, verse rebooting the box.
The Invoke-CsComputerFailOver cmdlet migrates the users and data, and drains the existing conferences and sessions before the Skype for Business services are stopped and disabled to prevent accidental restart when computer is rebooted.
When you run this cmdlet you will experience that it might run forever before completed. That is probably why you have the -WaitTime parameter to say how long you accept waiting before the cmdlet terminate.
So now you could run this cmdlet on 2 out of the 10 FE servers and now patch two servers at the same time instead of just doing a single server at the same time and not affect Pool quorum or routing group quorum.
How Long to Wait?
There isn’t a reason to have wait for a single server at a time to do reboots in the environment. Now keep in mind, this process has to be closely watched and managed for you are shifting Routing Groups with multiple servers at a time when doing this process. If I had a small number of front-end servers in a pool, such as 5 or less, I would probably just do the approach of rebooting a single server at a time, due the fact that numbers of servers that make up the quorum and how each server makes up a bigger majority of the quorum. But if I had more than 8 front-end servers in a singe pool, then this approach can definitely save yourself some time.