AlwaysOn Availability Groups and SQL Server Jobs, Part 14: Establishing Synchronization Checks

When compared to AlwaysOn Failover Cluster Instances (FCIs) or clustered deployments of SQL Server, one of the main problems with AlwaysOn Availability Groups is that a single SQL Server database (or group of databases) is made to straddle multiple hosts—meaning that any server-level dependencies found in that database need to be mirrored across all servers that will host the database. Happily, the notion of Contained Databases (while still, arguably, not nearly as mature as most people would like them to be) helps address some of these issues and concerns—like Logins and users—but, sadly, falls short with other concerns such as what to do when we need to target databases from SQL Server Agent Jobs for one reason or another.

Consequently, once you throw AlwaysOn Availability Groups into the mix, you’ll need to make sure that any jobs you’ve defined (which target your Availability Group databases) will need to by synchronized across all servers so that details about execution time, job ownership, alerting, and, of course, the actual operations to be performed, are all perfectly synchronized across all servers in the topology.

Which, in turn, poses two questions:

What Mechanism do you use to keep these job details synchronized?
How can you detect or tell when jobs are NOT synchronized?

Options for Managing Job Synchronization

Technically speaking, there are a couple of different options for keeping SQL Server Agent Jobs synchronized across servers. One of the more immediate options that might come to mind would be to use SQL Server Integration Services (SSIS) and the Transfer Jobs Tasks to keep jobs synchronized.

Only, the SSIS Transfer Jobs Task was designed to transfer jobs—not keep them synchronized. Arguably, it even provides some challenges even when it comes to merely transferring (or moving) jobs from one instance to another—given that there are limitations as to how the Enabled/Disabled statuses of transferred jobs should be handled (options for managing these states are pretty lame, as we can only define whether All jobs transferred should be enabled/disabled on the target/destination server).

Accordingly, a better technique for actually synchronizing SQL Server Agent Jobs between servers is, strangely enough, to script them at one server and run them at other servers. Technically speaking, once you’ve synchronized all jobs, it shouldn’t be too frequent that large numbers of them will fall out of synchronization with each other. Granted, you will occasionally modify some jobs—and that’ll entail a bit of effort. But it’s nothing too terrible:

Make any needed changes to your job on a single server.
Once changes are made, script the new job definition.
You’ll then have to connect to all other servers and DELETE the old definition of your job.
And then run the scripted version of your job to ‘replace’ the old definition.

Sadly, you won’t be able to use the Script Job as > DROP and CREATE . . . because that logic drops Jobs by their Job ID (and that won’t usually be the same across all of your servers). Likewise, if you want to retain job history, then dropping and scripting over-the-top of your old jobs isn’t a good idea—and you’ll need to manually make all of the changes to ALL of your jobs.

Otherwise, I find that scripting is hard. In fact, two tricks that help with scripting quite a bit are as follows:

Script Multiple SQL Server Agent Jobs at Once. If you need to script multiple jobs at the same time, you can easily do so in SQL Server Management Studio (SSMS) by using the Object Explorer Details window. Simply select the SQL Server Agent > Jobs node on the server you wish to script from, and then press F7 to open the Object Explorer Details window. From there, you can SHIFT+SELECT and/or CTRL+SELECT multiple SQL Server Agent Jobs, then right click and script all of them as needed—and as shown in the screenshot below:
Change Connections to Avoid Having to Copy/Paste. Of course, if you generate a script, you still then need to copy/paste/execute it against all of the servers where you’ll need to run it. That’s not the hardest thing in the universe, but you’ll have to establish new connections to Servers (if you haven’t already done so), and you’ll have to spin up a New Query to paste and execute the code. A simpler way is to simply right-click on the script you’ve just created and then select the Connection > Change Connection option to change the connection of the script you’ve just generated—to target one of the servers where you need to run this code. Then, run the code against that targeted server, and rinse and repeat, as needed until you’re done.

Options for Determining Job Non-Convergence

Otherwise, once you’ve tackled the means of synchronizing jobs when you find them out of sync with each other, then the only question is how you go about detecting such a problem. And, sadly, that means you’ll need to end up letting the servers somehow talk to each other. Or, you’ll need to set up some sort of third party process or application that can connect to all the servers hosting your Availability Group and query Jobs details to see if there are any problems.

Personally, I think that Linked Servers are the way to go in this case—as you’re already ensuring that the servers hosting you Availability Group(s) are talking to each other anyhow. As such, in my next post, we’ll look at some of the security concerns we’ll need to address to let code running on, say, the primary server reach out to other servers in your topology and query details about Jobs. Then, in subsequent posts we’ll look at the actual synchronization checkup logic I’ve used as well.

Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3
Part 11: A Fork in the Road
Part 12: Issues with Enabling and Disabling Batch Jobs
Part 13: Creating and Populating a Batch Job State Table

Comments

Plain text