AlwaysOn Availability Groups and SQL Server Jobs, Part 26: Using Master Servers for Job Administration

When I initially decided to write about AlwaysOn Availability Groups and SQL Server Jobs, I knew I’d have a bunch of different posts on a variety of different topics. To that end, I initially planned on this series of posts being comprised of 3 main parts:

Introduction. Posts 1 through 6 – where I introduced the ‘problem domain’, defined terms, and provided some high level rationale for why synchronizing jobs and worrying about failover is even a concern.
SQL Server Agent Jobs. Posts 7 through 18 – where I covered some of the problems and limitations – as well as options for work-arounds in dealing with ‘batch’ jobs that can potentially span multiple servers.
Backups. Posts 19 through 25 – where I touched upon some high-level considerations for making sure that backups work as expected and don’t cause any surprises.

I was then planning on posting a wrap-up or conclusion post (or possibly two) and then calling it quits. But along the way, I’ve decided to wrap up this series with a fourth section:

Additional Considerations. Posts 26 through 30(ish) – where I’ll provide some very high-level notes and details on some more ‘advanced’ concerns, strategies, and potential techniques for addressing SQL Server Jobs when AlwaysOn Availability Groups are involved.

And, to kick this section off, I wanted to share a few words or notes about Master Servers.

Multi-Server Administration, AGs, and SQL Server Agent Jobs

A key component of this series of posts has focused on the need to try and synchronize SQL Server Agent jobs across a number of AlwaysOn Availability Group hosts – so that whether we’re talking about backups or batch jobs, you’re sure that your jobs are firing WHEN needed and on or against the preferred/correct host as needed. To address that need, I’ve blogged about a number of options or techniques for ‘synchronizing’ job settings, execution details, and the likes across a number of servers – all by tackling things with scripts, or ‘manually’ as it were.

Interestingly enough, SQL Server actually has a FANTASTIC mechanism in place to handle these ‘synchronization’ needs right out of the box – no custom coding needed. And the way that this is handled is by setting Multi-Server Administration – or the idea that you can take a Single SQL Server Agent and make it a Master Server (MSX) that can ‘drive’ or control the execution of jobs and other SQL Server Agent details out on managed servers called Target Servers (TSX). The concept itself is fairly simple, and there are a couple of great articles that outline how it can be done including the following:

If you’ve been following along with the posts in this series, and if you’ve never heard of Multi Server Administration (of have heard about it, but like most people, forgot about it), you probably just had a light-bulb show up over your head as visions of using MSX/TSX to manage SQL Server Agent Jobs just flashed through your mind.

If so, the good news is that they CAN be used for this exact purpose. But, as you might suspect, there are a few caveats.

Using Master Servers to Manage Jobs when AGs are Involved

On the surface the technique of using a Master Server to manage SQL Server Agent Jobs on your Availability Group hosts is pretty simple. All you need to do is:

Set up an MSX/TSX configuration – where you’ve got your AG hosts set up as Target Servers where you can ensure each of the SQL Server Agent Jobs you want/need to run will end up running across ALL Target servers at the same time – and with the exact same logic/code defined.
Tweak or ‘modify’ your ‘multi-server’ Jobs so that they include logic to detect preferred replicas for backups (when dealing with backups) – as per post #22 – or to include logic to detect the primary replica when dealing with batch jobs – as per post #7.
Sit back, relax, and let things work themselves out – without all of the hassle, overhead, and problems associated with synchronization that I’ve covered in previous posts.

Of course, that’s the high-level overview of how things work. And I certainly don’t want to dismiss MSX\TSX by any stretch of the imagination – because it IS one of the hands-down best options you have out there for managing SQL Server Agent jobs against your AlwaysOn Availability Group hosts/databases.

But there are a few things you WILL want to be aware of if you decide to go this route. These should be fairly obvious concerns or caveats, but I want to call them out anyhow:

Don’t host your Master Server (MSX) on one of your Availability Group hosts – because you’d be compromising fault-tolerance for your jobs/backups. Or, in other words, since the whole purpose of Availability Groups is to account for the fact that an individual SQL Server instance (i.e., an AG host) can and WILL go down, putting the equivalent of the ‘Master Controller’ SQL Server Agent that’s coordinating execution of all of your jobs on one of your AG hosts is a guarantee that you’ll eventually lose your MSX at some point – and jobs/alerting/etc. will suffer as well.
If execution of your jobs and backups is important (and it most likely is), then you’re going to want to CLUSTER your MSX or Master Server. Or, in other words, if you stand up a new ‘controller’ (i.e., Master Server) to coordinate the execution of your jobs across multiple, fault-tolerant, SQL Server instances that are hosting your Availability Group databases, you’ve now introduced a new single-point-of-failure for a potentially mission-critical set of operations: backups and jobs execution. As such, you’re going to need to make your MSX fault-tolerant – something you can fairly easily do by having the SQL Server Agent acting as your MSX end up being hosted as part of an AlwaysOn Failover Cluster Instance – so that it’s adequately fault-tolerant as well.
Synchronization. MSX/TSX configurations may be pretty awesome once you’ve got them working correctly, but they can sometimes get out of sync.
SSIS Packages. While MSX/TSX functionality is actually set up to tackle SSIS packages (in the form of SQL Server Maintenance Plans) out of the box, I suspect you’ll potentially run into issues with those ‘over time’ – as well as any other SSIS jobs/packages you may have as well. Sadly, I can’t quantify any of this or point to any links on it – so I might be wrong here – but I have vague recollections of running into problems with these in the past (years ago) – so I figured I’d list SSIS packages as a POTENTIAL concern.

Otherwise, if you’ve got the servers (hardware and SQL Server Licenses) needed to make Multi-Server Administration work – it is going to be one of your best options for getting tackling management and execution of SQL Server Agent Jobs when AlwaysOn Availability Groups are thrown into the mix.

Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3
Part 11: A Fork in the Road
Part 12: Issues with Enabling and Disabling Batch Jobs
Part 13: Creating and Populating a Batch Job State Table
Part 14: Establishing Synchronization Checks
Part 15: Setting Up Linked Servers
Part 16: Job Synchronization Checkup Logic
Part 17: Creating Jobs to Check on Synchronization
Part 18: Health Checks for Availability Groups
Part 19: Availability Group Database Backups
Part 20: Avoiding Backup Fragmentation
Part 21: Assigning Backup Preferences
Part 22: Executing Backups
Part 23: Maintenance Plan Backups and SSIS Packages
Part 24: Synchronizing SSIS Packages
Part 25: Final Remarks on Backups

Comments

Plain text