Note: These three gotcha blog posts are somewhat optional in the sense that if you're reading along and are just interested in a solution, these posts won't help too much. Instead, they're designed to show WHY trying to dynamically detect whether to execute a job or not (based upon whether the server it's running on hosts the primary replica or not) is problematic.
In my last post, Dynamic Detection Gotcha #1, I outlined why trying to run the code in the screenshot below would fail when running on a non-Primary (and non-ReadOnly) AlwaysOn Availability Replica:
And the reason why this would fail (despite the fact that there is test logic in place to try and detect whether or not the Replica being targeted is the Primary or not), is because the MyAGDatabase on a secondary server is, effectively, in the Restoring state and therefore, can't be accessed at all.
Logically speaking then, it seems like all we'd need for the code above to work correctly would be to make it a bit more dynamic—and maybe do something like start in the master (or some other) database and then use a USE statement to switch into the MyAGDatabase once detection has been determined, as follows:
Logically, the code in the screenshot above seems like it should work; if the server where the code is running is hosting the Primary Replica, then we'll branch into the logic that will switch us into the MyAGDatabase, and run our code—or, if we're not on the Primary server, then the code will just exit gracefully and not do anything. Logically, everything should work—which is why this is Gotcha #2.
Using USE is Harder Than You Might Think
Switching database contexts (i.e., executing a USE dbNameHere; command) is pretty trivial. Sadly, however, the presence of a USE statement is always validated when queries or batches are PARSED—meaning that the presence of a single USE statement in 1, 10, 200, 2000, etc., lines of code will always be evaluated as being a VALID operation BEFORE even a single line of code is actually run. This, coupled with the fact that the MyAGDatabase is inaccessible, means that just adding a USE statement into your detection logic will never work.
This is simple to demonstrate by trying to USE a database if, and only if, it exists by trying the following code or logic:
-- FakeDB does NOT exist: IF EXISTS (SELECT NULL FROM sys.databases WHERE name = 'FakeDB') BEGIN USE FakeDB; -- this will ALWAYS throw an error PRINT 'Using FakeDB'; END ELSE PRINT 'Not Using FakeDB - it doesn''t exist.';
As the comments in the code above indicate, the FakeDB doesn’t exist. But even though we're wrapping our USE statement in some logic that will prevent SQL Server from ever branching into the USE statement shown above, that won't prevent SQL Server from:
a) parsing your batch;
b) detecting that there's a USE FakeDB statement;
c) verifying that FakeDB does not exist—and that a call to use it is therefore illegal; and
d) throwing an error
And, there's simply NO getting around this. None of the following options will let you get around this:
Using IF/ELSE statements. As above, an IF/ELSE will always throw an error when it detects the USE
Using GOTO statements. Trying to trick your way out of this problem won't work either—GOTO statements are a bit . . .cumbersome, but still run afoul of a USE
Using a RETURN to short-circuit. Trying to set up simple IF NOT PRIMARY logic that would execute a RETURN prior to your USE
; statement won't work either. Again, this code is parsed/evaluated BEFORE it is actually executed. (Plus, you can't use a RETURN statement in the command/code window for a SQL Server Agent job step.)
Options and Work-Arounds
Given that a USE statement is parsed before code is actually run, there are a few options that could be used to work around this particular issue when it comes to detecting whether or not code should be run (based on whether or not the target database is the Primary replica or not). Here they are, in no particular order:
- Use fully dynamic SQL. Even mentioning this option makes me cringe. But, the idea is that you could wrap ALL of your code and then execute it using EXECUTE or sp_executesql—after you've determined if the code should be run at all. Technically, this IS possible. But because of the potential for security issues, AND because this could quickly get INSANELY hard to manage as jobs and operations became more and more complex, I won't actually consider this as a truly viable option.
- Using Sprocs for Redirection. This is a bit wonky too—and is, in my mind, just a single step down or only slightly less ugly than using fully dynamic SQL. But, the idea is that you COULD create sprocs in, say, the master database (you'd then have to keep this code synchronized across all servers in your AG topology), and then your SQL Server Agent Job steps/logic could: a) detect whether the host in question is the Primary or Not and then either exit gracefully; or b) keep processing code/operations by trying to run something like EXEC master.dbo.DoJobLogic @Params, @Etc, where; c) the body of that dbo.DoJobLogic sproc would just redirect with something like: "DELETE FROM MyAGDatabase.dbo.SomeTable WHERE blahBlahBlah = @Params" and so on. Again, I wouldn't recommend this, but it is possible. Right out of the gate you'd likely run into issues even trying to INSERT/UPDATE/DELETE against tables in a different database (though you could use SYNONMYMS here—one of the few places I think they actually even come close to making sense), and you’d likely find that maintenance (especially through all of the indirection) would be a pain. Accordingly, I wouldn't recommend this option at all—and would strongly recommend AGAINST it (just as with dynamic SQL).
- Job Step Injection Checks. This one is viable, but comes with its own baggage (as we'll see in the next post). The idea, though, is that if you've got a SQL Server Agent Job that's got just one Job Step (for example), that you'd go in and ADD a new Job Step BEFORE your existing Job step, and have this new/injected Job Step check to see if the target DB (for the second Job Step) is acting as the Primary replica on the current server or not, and then either continuing on to Job Step 2 (if we're on the Primary) or simply short-circuiting and exiting if we're not on the primary. Overall, the process is fairly convoluted, not something I'd want to manage, and comes with some additional ugliness above and beyond the obvious problems you'd think of in a case like this. We'll look into this further in my next blog post.
- Give Up and use a Different Approach. Instead of trying to cram a square peg in a round hole (i.e., trying to get around the problem with the USE statement), we could just go ahead and look at a different approach. Or, in other words, instead of trying to dynamically detect whether or not code should be executed at RUN time, we could go with the other option that I presented a few posts back and simply synchronize jobs across all servers and then actively enable or disable jobs based upon whether or not they're being hosted on the server with the Primary replica(s), or not. That might sound like a bit of work—and it is—but it's nowhere near as ugly or problematic as any of the "dynamically detect whether to execute at run time" options available.
Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1