AlwaysOn Availability Groups and SQL Server Jobs, Part 11: A Fork in the Road

In previous posts (especially Dynamic Detection Gotcha 3 and High-Level Options and Rationale for Handling Batch Jobs), I’ve asserted that while it’s not very hard to determine whether a SQL Server Agent job targeting an AlwaysOn Availability Group database should run or not, it can be a bit harder to get that job to run or not given some of the constraints of how SQL Server and the SQL Server Agent Work.

That’s still true, there are some definite gotchas, which I’ve outlined in the three posts preceding this one:

Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3

Cue the Sad Trombone Sound

However, in a situation that still has me scratching my head (and doubting my sanity), I somehow managed to miss one possibility that would have made many of these complications easier to work around. More succinctly, the following code or approach simply won’t work:

And I’ve outlined why that’s the case in AlwaysOn Availability Groups and SQL Server Jobs, Part 9: Dynamic Detection Gotcha #2. What I somehow missed, though, is that the following code will 100 percent work, and without any problems:

Cue Egg on the Face

And that’s a huge issue because:

I swear I thought that the code above (in the second screen shot) SHOULD have worked and I can distinctly remember (and even have some rough notes to help corroborate this confusion in OneNote) trying this 2x in different environments and being surprised when it didn’t work. Obviously, I was doing something insanely stupid and totally managed to somehow miss the boat in diagnosing what was going on because
I know the code in the second screenshot above should always work. And that’s because execution of stored procedures is deferred until execution time of the code in question (unlike the inclusion a USE statement, which is evaluated when your code is parsed, as outlined in AlwaysOn Availability Groups and SQL Server Jobs, Part 9: Dynamic Detection Gotcha #2).

Sadly, somehow botching the above means that:

Missing this approach as an option/work-around complicates everything, as the option to simply use an if/else around the call into a sproc (in the target database) is quite a bit easier than what I was doing; and
Missing this makes me look like an idiot. (And, to be fair, I don’t claim to be the smartest guy around, but a screw up like this one inside a long/ongoing series of blog posts just isn’t my idea of a good time).

Nonetheless, I’m very thankful that reader Dale Hirt called this out in a comment on post #9. (In a sort of positive and upbeat way of looking at this whole affair, I guess I can say that the only thing worse than being an idiot in a long series of blog posts would be to end up being an idiot with a long series of blog posts that NO ONE read. So, while I really wish I wouldn’t have flubbed this one, I’m still happier to have been called out on this because it at least means people are reading what I’m writing.)

Still, though, this new development complicates things a bit. Long term, it leaves us with two primary options going forward: Dynamic Detection at Job Execution time OR the option of dynamically enabling/disabling jobs. What follows is a quick overview of the pros and cons of each approach.

Dynamic Detection at Run Time (The Easier Approach)

Dynamically detecting whether to execute at run times means waiting until a SQL Server Agent’s Job Schedule is triggered and the job is run to figure out whether or not code should be run (as opposed to pre-emptively enabling or disabling jobs, as outlined below in the next section).

Overview/Requirements

At a very high level, here’s what needs to be done to use the approach of dynamically detecting (at run time) whether or not a job or its Job Steps should be run:

Make sure jobs run in master (or another, non-AG enabled) database. (See Dynamic Detection Gotcha #1 for more details.)
Pre-check before execution using if/else logic. (See Detecting Primary Replica Ownership for sample logic or details.)
Encapsulate Batch Job logic in a sproc that lives in the target DB.
Make sure the job ends up being synchronized across all servers participating in the AlwaysOn Availability Group Topology.

Pros

Overall, this approach comes with, primarily, nothing but benefits.

Simplicity. Using If/Else logic to "wrap" whether or not logic should be executed or not is pretty easy to pull off. Furthermore, when you relegate execution into the target database using a stored procedure, then you don’t run into any of the negatives outlined my ‘Gotcha #3’ post (see post #10 in this series for more details).
Sprocs as Logic Containers. I’m a big fan of wrapping Batch Job logic up in a sproc and just running an Exec statement in my SQL Server Agent Job Steps, instead of trying to put all of the logic and code inside the Command [text?] window in a Job Step. I’ve outlined more about that in Stored Procedures: Still Relevant. Stated differently: the win with If/Else + sproc execution is that it’s pretty straightforward.

Cons

There are, however, one or two VERY minor cons:

Synchronization. Not really a con, per se, as you have to keep job details synchronized across all servers no matter WHAT approach you take. But, I thought I’d list this here just for the sake of thoroughness.
SSIS Packages. This is the primary negative of this approach (i.e., using Dynamic Detection within job steps themselves) and is due to the fact that it’s not too easy to inject If/Else logic into SSIS Package Execution. In fact, there IS no way to really inject the option to do an if/else check within the ‘execution instructions’ of a SQL Server Agent Job that’s set up to run or execute a Job Step of type SQL Server Integration Services Package, as per the screenshot below:
Maintenance Plans. Maintenance Plans are a whole topic in and of themselves, so I’ll touch upon them in a future post. In addition to being crap-tastic in a number of ways though, the big problem they suffer from (in terms of this discussion), is that they’re ultimately just SSIS packages. However, there is a pretty easy way to interject IF/ELSE logic into them (that’s typically not as easy with normal SSIS packages).

Dynamically Enabling or Disabling Jobs (The Harder Approach)

Opposite of trying to determine if logic should be run each time a job is executed, is the notion of simply disabling or enabling entire jobs wholesale depending upon whether or not the server they’re living on is the host of the Primary Replica being targeted.

Overview / Requirements

To use this approach, you need to address the following concerns:

You’ll need logic to periodically check on and toggle enabled/disabled status on servers. Underlying logic for this is based upon the notion of using UDFs to determine whether or not the host in question owns the primary replica being targeted, but from there things devolve into the need to enable/disable Jobs.
In order to better determine if jobs should/should not be enabled, there needs to be a way to associate them with a target database (or a target Availability Group). See Defining Batch Jobs, for more information on differentiating between "server level" and "batch jobs," as you can’t simply assume that all jobs on a server that’s not hosting the primary instance should be disabled.
Furthermore, logic responsible for enabling/disabling jobs needs to account for the possibility that just because an Availability Group database may failover to a new server (where, say, 4 Batch Jobs target one of the databases in question), not all jobs should, necessarily, be automatically enabled JUST because the database they’re targeting is now, all of a sudden, the active Primary. Or, in other words: just because a job targets a given database doesn’t mean it should always be enabled.
As with the other option outlined above, synchronization of Job details across all servers participating in an AlwaysOn Availability Group is essential.

Pros

In addition to the pros outlined in Dynamic Detection Gotcha #3 (i.e., avoiding the Gotchas that I outlined there, though many of them can be avoided using the approach of dynamic detection at run time as well), this approach provides a few benefits.

SSIS and Maintenance Plans. Opposite of the other approach outlined in this post, disabling and enabling entire jobs can be an easier way of dealing with SSIS Jobs, especially when you have large numbers of them. Similar benefits would apply to Maintenance Plan jobs (though I don’t see why you should EVER need more than just a few of these on ANY given server since they really should only be trusted for backups if you don’t have a better third party solution on hand).
Easier to Manage for Complex Jobs. Complex jobs with lots of job steps (something I wouldn’t typically recommend), require the injection of if/else logic against all code and all job-steps. In some cases, if these jobs are frequently modified and updated, failure to wrap a block of code in an if/else block might be a problem. Consequently, simply enabling the entire job or not based upon whether it’s on the host with the primary replica or not might be an easier approach.

Cons

Sadly, there are a number of significant cons to using this approach, all of them stemming from the additional complexity this approach entails.

Complexity. Building jobs that enable/disable jobs and a mechanism to track the enabled/disabled state of various jobs isn’t exactly rocket surgery, but it’s not as easy as simply wrapping some if/else logic around the code you want to execute within simple jobs.

Going Forward

Amazingly enough, when I sat down and outlined the roughly 30 blog posts, I intended to write as part of this series, I had (as I alluded to above), more or less convinced myself that enabling/disabling jobs at run time was the easier option—and have/had implemented this approach in a couple of different deployments all with success. Consequently, future posts would have been focused on taking what is now the harder approach. Figuring that that does, and will have merit for other folks reading these posts, I’m primarily going to keep going with that approach (i.e., outlining in detail how to tackle the prospect of enabling/disabling entire jobs).

However, I will also take some time to post or outline the easier approach as well. Consequently, my expectation is that most folks reading this series will LIKELY end up using the easier approach in many deployments (I think it has greater applicability and should be easier to implement and manage), but will be able to fall back to the more complex or harder approach in situations where SSIS tasks or complex jobs would otherwise PREVENT the easier approach from being useful.

Meanwhile, I’ll just be over here kicking myself for having missed something so blatantly obvious for such a long time. Sigh.

Comments

Plain text