AlwaysOn Availability Groups and SQL Server Jobs, Part 13: Creating and Populating a Batch Job State Table

In this post, we’ll take a more detailed look at some simple mechanisms that can be used to track the state of SQL Server Agent Jobs being used as Batch Jobs (see Post 3: defining Batch Jobs for more details) against SQL Server AlwaysOn Availability Group Databases.

Why a State Table?

Again, the reason for a state table is fairly straight forward. Once we’ve identified which Jobs belong to a given Availability Group, then it initially seems like those Jobs should simply be disabled on all non-Primary Replicas and enabled on only the Primary Replica. Only, the problem with that initial assumption is that this approach won’t work if SysAdmins have temporarily (or longer) Disabled a particular job.

For example, assume we’ve got a process that runs nightly, and truncates audit data that is more than 13 months old—in keeping with audit/retention policies. Then, assume that there’s been some semi-recent discussion about needing to keep that data on hand for potentially 25 months—but nothing has been finalized. In such a case, rather than making the change (before being positive it’s needed) and rather than letting data be truncated, an admin simply makes the decision to Disable the job in question and re-address the problem in a few weeks.

Then, assume there’s a failover. If we use the simple logic of assuming that all Batch Jobs targeted against our Availability Group database should simply be disabled on the non-Primary Server(s) then, that’s fine. But, if we use that same logic and assume that all jobs against this same database should be enabled on the server now hosting the Primary Replica, then we’ll see this job turn on, and we’ll start deleting data due to a failover. Hence, the reason or need for a state table or something that can and will keep track of which Jobs are enabled or disabled.

Implementation

Ultimately, I’ve chosen to make this state table as simple as possible in my own deployments. In my mind it, doesn’t need so many of the things that we as database professionals tend to think we need to throw at everything (such as mountains of meta-data covering the widest possible assortment of details on when changes were last modified, who owns a particular resource, and the likes). Instead, the approach I’ve taken is to simply store the name of the Job in question and a simple Boolean flag indicating whether it’s enabled or not.

To that end, the following code is what I use to create this table:

USE [<TargetDBName, sysname, YourAGDbNameHere>]
GO

CREATE TABLE [dbo].[<AGName, sysname, NameOfYourAgHere>_JobEnabledStates](
	[JobName] [sysname] NOT NULL,
	[Enabled] [bit] NOT NULL
) ON [PRIMARY];
GO

And note that I’m using T-SQL Template syntax to provide place holders for two ‘variables’

TargetDbName. Or the name of the Database where this table will be living. Note, of course, that the database has to be an Availability Group Database and, likewise, should be a member of the Availability Group which your Batch Jobs are targeting.
AGName. Or the name of your Availability Group. So, for example, if I had 2 databases, SSV2 and SSV4 that were logically linked and deployed into the same Availability Group called SSV, I’d specify SSV4 for the TargetDbName value and SSV for the AGName value.

The idea is that this state information now becomes part of an Availability Group database and therefore, shunts from server to server as part of Failover, and is always, therefore, found on the Primary Replica in a read-able (and write-able) state.

Once the table is created, though, you’ll need to populate it with data. To do that, you could run a bunch of INSERT statements against it manually. But to take some of the tedium out of that affair, I’ve created a simple sproc (that I deploy to the master database of all servers participating in my Availability Groups) that looks as follows:

USE master;
GO

IF OBJECT_ID('dbo.dba_DocumentJobEnabledStatuses','P') IS NOT NULL
	DROP PROC dbo.dba_DocumentJobEnabledStatuses
GO

CREATE PROC dbo.dba_DocumentJobEnabledStatuses
	@AGName				sysname, -- name of the AG to failover. 
	@JobsTableDBName	sysname -- name of the db with the jobs-enabled-details table
AS
	SET NOCOUNT ON; 

	-- if we're not on the primary server, then bail (reporting an error 
	--	- since this sproc should be executed manually:
	IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN 
		RAISERROR('Server does not host primary Replicas for specified %s.', 16, 1, @AGName);
		RETURN -1;
	END

	-- start by truncating the data in the target table:
	DECLARE @sql nvarchar(MAX) = N'TRUNCATE TABLE ' + @JobsTableDBName 
		+ '.dbo.[' + @AGName + '_JobEnabledStates];'
	EXEC(@sql);

	-- create a statement to INSERT records into the table: 
	SET @sql = 'INSERT INTO '  + @JobsTableDBName + '.dbo.[' 
		+ @AGName + '_JobEnabledStates] (JobName, [Enabled]) 
	SELECT 
		j.name [JobName], 
		j.[enabled] [Enabled]
	FROM 
		msdb.dbo.sysjobs j 
		INNER JOIN msdb.dbo.syscategories c ON c.category_id = j.category_id
	WHERE
		c.name = @AGName;' 

	-- execute the statement: 
	EXEC sp_executesql @sql, N'@AGName sysname', @AGName; 

	RETURN 0;
GO

This sproc, in turn, takes in two variables—which logically map over to the same template parameters used to create the table above. Or, in other words, if you simply call this sproc as follows:

EXEC master.dbo.dba_DocumentJobEnabledStatuses 'SSV', 'SSV4';

Then it’ll go in, truncate all records currently in the table, and then simply ‘document’ the Enabled/Disabled status of all SQL Server Agent Jobs that have been set to the custom ‘SSV’ SQL Server Agent Job Category (mentioned in my last post).

And so, the idea here is that once you create this table, all you have to do is run the sproc above and it’ll document job states for you—making it super easy to document things whenever you make changes or not. (So, in the example above, when an admin came in and temporarily disabled the job to truncate audit data, all they’d have to do would be to disable the job, then go out to the Primary Replica and run the sproc above to document the updated state of all jobs on the server that target the Availability Group in question.)

The Need for Synchronization

Of course, just because the sproc outlined above makes it fairly easy to document Job Enabled states, that doesn’t mean admins will always remember to run this update each time they make changes. Which, in turn, brings us to one of the biggest and most important problems you’ll need to address when tackling any kind of SQL Server Agent job when Availability Groups are involved: making sure that job details (and state info) stays synchronized across all servers (and against state-table data) over time. As such, we’ll take a look at this concern next in Establishing Synchronization Checks.

Part 1: Introduction
Part 2: Putting AlwaysOn into Context
Part 3: Defining Batch Jobs
Part 4: Synchronizing Server-Level Details
Part 5: Setting Up Failover Alerts
Part 6: High-Level Options and Rationale for Handling Batch Jobs
Part 7: Detecting Primary Replica Ownership
Part 8: Dynamic Detection Gotcha #1
Part 9: Dynamic Detection Gotcha #2
Part 10: Dynamic Detection Gotcha #3
Part 11: A Fork in the Road
Part 12: Issues with Enabling and Disabling Batch Jobs

Comments

Plain text