Reasons to Avoid SQL Server In-Place Upgrades

Problems and Solutions road signs As a consultant, one of the things I commonly do is help organizations upgrade to newer editions of SQL Server. When doing so, I never recommend in-place upgrades for production servers. Instead, I always recommend standing up brand new servers to temporarily run side-by-side until configuration and security settings have been validated and until it's clear that the new hardware can handle expected workloads. Then, after acceptance testing (i.e., verification from a couple of key end users that everything is working correctly), an explicit transition can be scheduled where the new servers take the place of the old servers.

Related: A Day in the Life of a DBA

In my experience, in-place upgrades are almost never worth it (except, potentially, with the exception of some dev workloads). Here are some reasons why:

Gremlins

Call me nuts, paranoid, or just plain crazy, but I simply don't trust Windows or SQL Server installations. Yes, most of the time, they go completely as expected. But upgrades commonly bump into ugly problems. And if we're talking about production workloads, running into one of the random problems that can crop up when trying to run an upgrade on a production server can be insanely expensive—not just in terms of additional downtime, but also potentially in terms of your credibility.

Related: The Upgrade Dilemmas

Furthermore, even if you want to argue that an in-place upgrade where you reformat the server back to bare metal, spin up a new OS, and then install SQL Server (the new version) over the top of your old hardware (after grabbing db backups, etc.) can work, I'll still argue against such an approach. First of all, I've seen too many strange or weird cases where the installation of either Windows Server or SQL Server just doesn't take correctly. Yes, those instances are increasingly rare. But, when we're talking about a production workload, I almost never find the potential risk to be acceptable. Furthermore, even if you delude yourself into thinking that you can practice or rehearse the upgrade process, you're doing that in a controlled environment and not against the actual hardware or metal where a driver problem or issue might crop up.

Instead, I find that it's much easier and effective to stand up a new replacement host, configure it without any stress or duress (i.e., without worrying about a ticking clock counting down to when you need to be back online or running), and double-check all best practices and security configurations, and the like. Then, you can just copy security information (such as logins, etc.) on to the server and load on copies of your databases (i.e., via backups) to then start pointing application servers at this old/stale data to begin acceptance testing. Then, once you're sure everything is working correctly, you can just schedule a time to take the current production server down and switch over to the new server. And, if something goes wrong during that transition (as long as you haven't started writing data to the new location), then you can always abort your migration and switch back to the old server and try again another time. Whereas, if you try an in-place upgrade and something goes wrong, then, well, you're screwed and are stuck scrambling to get something back up and running.

Artifacts

If we assume that gremlins represent show-stopper or fatal problems that completely thwart upgrades, then what about something even worse? Like, something that fails to upgrade or install correctly as part of your upgrade—but which you don't detect right away. Then, imagine the cost (in terms of pain/manageability) of having to workaround that issue more or less indefinitely. While such artifacts are rare, the costs they impose ensure that they're something you want to avoid like the plague. And while you CAN miss these kinds of issues or problems when you stand-up a brand new server to replace your old server, my experience is that you have much more time and ability to test and interact with this server (and hopefully, therefore, find such problems or issues) as opposed to being in a serious hurry to get core functionality up when doing an in-place upgrade.

Missing out on Hardware Improvements

If you're upgrading to a new version of SQL Server, it's likely been about 2-3 years since your previous host was provisioned. Hardware capabilities have, effectively, doubled in that time. And the size and demands of your workloads have likely increased over that same time—so attempting to upgrade on the same/old hardware can and will work (each new version of SQL Server usually picks up around a 15-20 percent performance boost due to increased efficiencies in the underlying engine)—but just doesn't make as much sense as upgrading to new hardware.

Of course, while that argument makes instant sense to tech heads, it doesn't always make great sense to bean-counters (i.e., if your CIO or manager comes from a background more related to bean counting. or, if funds are sparse, then hardware benefits and improvements alone might not be enough to justify). Still, I'll always argue that disruptions caused by anything listed above also come with costs (which can sometimes be unacceptably costly in terms of negative user interactions or linger maintenance costs), and that new hardware is almost always worth it. Besides, when you take this approach, it's not like you throw away your old production hosts when you're done. The old production hosts can easily be repurposed for other tasks (for other workloads) or can be reused as staging or dev servers, if they're still in decent-enough working order.

Testing

Unless you're just assuming that you can do an in-place upgrade and everything will work—which is just plain dumb—then you're going to need to spin up some sort of additional host or server to act as a testing server that you can use to see how your applications behave when they've been upgraded.

In other words, if you're going to go through this effort anyhow (and you really would be stupid not to), then WHY test this on anything BUT the new hardware where you're going to actually be hosting your databases going forward? In other words, with an in-place upgrade, you're going to have to go 'fake' an upgrade/testing environment, and then, you'll just assume that everything you tested and validated there will work on your current/production hardware after you magically manage to upgrade that hardware without any issues. Whereas, if you perform a side-by-side upgrade, not only are you removing the guess work, but you can also actually put your new hardware under load (in many cases) to fully verify how it will behave.

Downtime

Another great benefit of side-by-side upgrades is that you can decrease downtime. In fact, with enough preparation and staging (along with rehearsals when/where needed), you can actually upgrade in 5 minutes or less. With in-place upgrades, you're forced to wait while a Wizard churns along and upgrades various components, databases, binaries, and the like. But with a side-by-side migration, you can verify that the new server is configured and working as needed, then load copies of recent FULL backups on to the box, restore (but not recover them), and then just keep applying T-Logs through the day (or use DIFFs + T-Log backups if that makes more sense). Then, you've got a warm spare configuration so that when your maintenance window arrives, you can just shut down the current production host, take tail-of-the log backups, and then apply them to your new server and then spin up your applications (with connection strings pointing at the new server) after recovering the database. All in all, very easy to handle, very easy to practice and rehearse, and very easy to pull off with minimal down-time.

Comments

Plain text