In this series of posts I’ve called out some of the concerns related to SQL Server AlwaysOn Availability Groups and their interaction with SQL Server Agent jobs – both in the form of Batch Jobs (see post #3) and backups. Many (or even most) of the posts in this series so far have really been geared at outlining some of the pitfalls you’ll need to address when tackling the prospect of managing Jobs when Availability Groups are thrown into the mix – along with providing some rationale around some of the choices I’ve made working toward solutions around these problems.
Along the way, though, I’ve primarily been focusing on ‘simple’ Availability Groups – or solutions involving just two replicas (a Primary and a non-readable Secondary). Which, in turn, is why the code I outlined in previous posts focusing on detection of jobs differences (i.e., synchronization) across hosting nodes or tackling the job of determining which server to run on have primarily centered around the notion of a PARTNER linked server (see posts #14 and #15). The problem, of course, is that while many Availability Groups are going to be done for ‘simple’ fault-tolerance or redundancy, I’m also certain there are plenty of AGs ‘out there’ that will be much more complex – or, shall we say, more advanced. As such, I wanted touch upon some concerns you’ll need to address when running AGs and SQL Server Agent Jobs in topologies with more than two replicas.
Jobs with Scale-Out and Location-Spanning Availability Groups
For situations where you’ll have more than 2 nodes – for whatever reason – some of the concerns or topics I’ve addressed already aren’t ‘good enough’ – as there are a few other things to consider. In no particular order, here are some things you’ll want to consider.
- Backups and multiple Secondaries. This one’s the easiest – because it’s already, effectively, covered. Just set your preferences for which replica you’d prefer to execute backups on, and you’re done. (See posts 20 – 22.)
- Disaster Recovery Scenarios. If you’ve created remote-availability solutions or ‘stretch’ deployments where, say, you’ve got 2 FCIs in a data center and have then ‘linked’ that cluster to a cluster in a secondary data center (via AlwaysOn ‘replication’), then you’ve done so for the purposes of creating as optimal of a fault-tolerant solution as possible. In most cases you’ll have tried to mirror settings/configuration details within one data center as closely as possible with those in the other. Yet, sadly, it’s still all too common for management to try and ‘dumb down’ one of those data centers (and the hardware capabilities) as a cost-savings device. (This usually never really works out as expected – so feel free to rail against this whenever you see it being attempted – unless there are VERY clear cut SLAs that can defend you against the backlash that will inevitably follow…). As such, if you’re in one of these unenviable positions, just realize, you’re in a scenario where your workloads should NOT be the same from one location to the next – so some jobs should potentially be turned off (or may not have access to some of the ‘infrastructure’ and endpoints that might’ve been accessible in the other datacenter – so you’ll have to plan accordingly (and take all of this into consideration for how this will impact the strategy you use to keep jobs synchronized and/or available to be activated/deactivated as needed).
- Licensing. Hopefully this one goes without saying, but I’m always surprised at how commonly organizations miss this one – so, here goes: Just remember that if you’re running Software Assurance, you’re effectively getting a ‘buy one get one (of equal or lesser value) free’ kind of deal – where Microsoft will sell you a SQL Server Enterprise Edition license that you can then use on a ‘primary’ server and where you can also spin up a SECONDARY server (on equal or lesser hardware) to be used SOLELY for failover purposes. (Sneaking reports off of that server, or letting developers spin up test databases, etc. renders this ‘failover license’ moot and requires you get a production license instead.) As such, whenever organizations start talking about scale-out scenarios (i.e., setting up read-only Secondary replicas), it’s important to call out that while those options ARE exciting and awesome, they do come at the price/cost of an additional license. (Though, this license, in turn, then comes with a ‘buy one, get one free’ kind of license too – when purchased with Software Assurance (as of SQL Server 2014 and above – before that, SA wasn’t required).)
- Synchronization and Job Activation. This is the real point I wanted to get at in this post. Sample code or example solutions in posts up to this point have really been focused primarily on ‘2 node’ Availability Groups – and have relied heavily upon the notion of using an abstraction in the form of having 2 servers that equally call each other ‘PARTNER’ (via linked server interfaces) as a way to make it pretty easy for them to check on each other and compare details about active jobs, job details (e.g., is a Job on ServerA defined the exact same way as on ServerB, etc.), and so on. The problem, of course, is that if/when you stand up ServerC and/or ServerD, etc. – this construct/trick no longer works. Instead, you’ll have to look into using something else to ensure that job definitions on your servers remain synchronized or the same. Handling failover, or detecting which server is ACTIVE and should handle the execution of your jobs, happily, remains the same – as the code outlined in post #7 still works as expected. That said, with read-only Secondaries (where you’ve paid for extra licenses to offload processing from a primary server), you MAY want to look into options that would let you target some (read only, of course) SQL Server Agents (for reporting or whatever) against one of your read-only Secondaries. If that’s the case, then you’re going to find that you’ll need some other way to detect a ‘preferred’ server after a failover occurs and shuffles everything around. One obvious option here might be to ‘key’ job execution off of the same logic that defines your preferred replica for backups. In many cases, if you’ve opted to off-load backup processing on to a secondary, it would stand to reason that running read-only SQL Server Agent jobs here would make sense too. Otherwise, if/when more than 2 servers are involved, you’re likely going to have to either set up something that checks all servers by name (instead of using the PARTNER) linked-server alias – and then you’ll also have to tackle the issue of determining where that code should run.
In my next post, we’ll take a bit of an additional look into this last point – and some ideas and options you can use to help make tackling jobs failover and synchronization logic more flexible and, under the right cultures or circumstances, easier to manage as well.
Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3
Part 11: A Fork in the Road
Part 12: Issues with Enabling and Disabling Batch Jobs
Part 13: Creating and Populating a Batch Job State Table
Part 14: Establishing Synchronization Checks
Part 15: Setting Up Linked Servers
Part 16: Job Synchronization Checkup Logic
Part 17: Creating Jobs to Check on Synchronization
Part 18: Health Checks for Availability Groups
Part 19: Availability Group Database Backups
Part 20: Avoiding Backup Fragmentation
Part 21: Assigning Backup Preferences
Part 22: Executing Backups
Part 23: Maintenance Plan Backups and SSIS Packages
Part 24: Synchronizing SSIS Packages
Part 25: Final Remarks on Backups
Part 26: Using Master Servers for Administration