Question: We’ve set up database mirroring for one of our critical databases with a witness server to allow automatic failover. Everything has been running fine until we had a power outage on the data center recently. Database mirroring performed the failover but people reported that the application just hung. When we manually failed back, the application worked again. Why didn’t the application failover too?
Answer: This is a reasonably common problem that people encounter when using database mirroring and a production failure like this often happens because no failover testing is conducted after database mirroring is implemented. We became involved after the failed failover.
To avoid taking production downtime, we duplicated the mirroring environment on the client’s test systems. After confirming that the application and database mirroring were working, we powered down the principal server and the application completely hung.
We checked that the mirror server had successfully initiated a failover and was online as the principal server. We also checked that the mirror database was online, could be used locally on the new principal server, and the principal server was accessible from a remote client, like the application uses.
This led us to check the application. We talked to the developer, who confirmed the application was using ADO.NET to connect to SQL Server and was using explicit client redirection, specifying the mirror server name in the ConnectionString property of the SqlConnection. (As an aside, it is always better to use explicit client redirection rather than relying on the mirror server name being cached automatically on the client once a connection has been made – implicit client redirection).
So why wasn’t the application failing over?
We dug deeper into how the application was handling connection failures and discovered that it was not coded to cope with the existing connection failing at all! Basically the application would open a connection to SQL Server when the application initialized and would never attempt to reconnect.
It turns out that although the DBAs had implemented database mirroring, no-one had discussed high availability with the application developer and so the application code was not changed. After re-working the connection layer in the application to cope with connection failures and implement reconnection logic, it worked perfectly after a database mirroring failover.
The moral of the story is that anything connecting to SQL Server has to be architected to cope with connection failures that occur and reconnect so that failover works correctly. In this case, our client would have discovered the problem during testing if they had tried a database mirroring failover before going into production.