AlwaysOn Availability Groups and SQL Server Jobs, Part 17: Creating Jobs to Check on Synchronization

In my previous post, I provided a high-level outline of the core logic (and rationale behind that logic) that would be needed to set up regular synchronization checks on SQL Server Agent Jobs for servers where AlwaysOn Availability Groups have been deployed. In this post, I’ll walk through the steps--and the code--needed to setup those checks.

Step 1: Start with Linked Servers

In order for the logic outlined in this post to work, you’ll need to set up Linked Server definitions between all of the servers hosting your AlwaysOn Availability Groups. In Post #15 I outlined some key concerns around Setting Up Linked Servers, which you’ll have to address before any of the code or logic below will work.

Step 2: Define UDF(s) to Determine Primary Replica Host

During checkups it will be important to know which server (or instance) is currently hosting the Primary Replica(s) for a given Availability Group.

As per post #11, A Fork in the Road, if you’re using the option/technique where you’ll be enabling/disabling jobs depending upon whether they’re on the same server as the Primary Replica(s), this logic will be indispensable. On the other hand, if you’re just using if/else logic within your job steps, you’ll still want to know which server is hosting the Primary Replica(s) in your target Availability Group when running synchronization checks so that you can simplify checks and run them only on/from the primary, as outlined below.

Technically speaking, there are two ways to check for "primacy"of your Availability Groups: by database name (for example, “Is the suchAndSuch database hosted on this server currently the Primary?”) or by the name of the Availability Group (“Is the blahBlah Availability Group currently hosting Primary Replicas on this server?”). Scripts for both techniques are listed below.

USE master;
GO

IF OBJECT_ID('dbo.fn_hadr_database_is_primary','FN') IS NOT NULL
	DROP FUNCTION dbo.fn_hadr_database_is_primary;
GO

CREATE FUNCTION dbo.fn_hadr_database_is_primary (@DBName sysname)
RETURNS bit
AS
	BEGIN
		DECLARE @description sysname;
		
		SELECT 
			@description = 	hars.role_desc
		FROM 
			sys.databases d
			INNER JOIN sys.dm_hadr_availability_replica_states hars ON d.replica_id = hars.replica_id
		WHERE 
			database_id = DB_ID(@DBName);
	
		IF @description = 'PRIMARY'
			RETURN 1;
	
		RETURN 0;
	END;
GO

USE master;
GO

IF OBJECT_ID('dbo.fn_hadr_group_is_primary','FN') IS NOT NULL
	DROP FUNCTION dbo.fn_hadr_group_is_primary;
GO

CREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname)
RETURNS bit 
AS
	BEGIN 
		
		DECLARE @PrimaryReplica sysname; 

		SELECT @PrimaryReplica = hags.primary_replica 
		FROM 
			sys.dm_hadr_availability_group_states hags
			INNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_id
		WHERE
			ag.name = @AGName;

		IF UPPER(@PrimaryReplica) =  UPPER(@@SERVERNAME)
			RETURN 1; -- primary

		RETURN 0; -- not primary
		
	END; 
GO

The naming "convention" around both of these UDFs is patterned after that of sys.fn_hadr_backup_is_preferred_replica. As with that UDF, if you specify the name of the @DBName or @AGName parameters above incorrectly, you’ll get a FALSE result instead of an error (which is by design).

Step 3: Deploy a Sproc to Check Sever-Level Jobs

With Linked Servers in place, it’s now time to check up on Server-Level Jobs--or SQL Server Agent Jobs that operate at the Server Level or which target "server level" details (such as checking available disk space, cycling the error logs or truncating msdb history). This logic will also (likely) be used to checkup on backup routines.

The idea behind this code is pretty simple: It just requests details about SQL Server Agent Jobs on one server, grabs details about Jobs on another server, and then compares relevant bits of information to see if there are any differences. Or, at least, that’s what it’s doing at a high level.

USE master;
GO


IF OBJECT_ID('dbo.dba_SyncCheckupForInstanceLevelJobs','P') IS NOT NULL
	DROP PROC dbo.dba_SyncCheckupForInstanceLevelJobs
GO

CREATE PROC dbo.dba_SyncCheckupForInstanceLevelJobs
	@MailProfileName	sysname = NULL,	-- e.g., 'General'
	@OperatorName		sysname = NULL,	-- e.g., 'Alerts', 
	@PrimaryAGAfinity	sysname = NULL, -- see comments above. 
	@IgnoredJobs		nvarchar(MAX) = '',
	@ConsoleOnly		bit	= 0		-- output only to console - don't email alerts (for debugging/manual execution, etc.)
AS 
	SET NOCOUNT ON;

	IF @PrimaryAGAfinity IS NOT NULL BEGIN 
		IF (SELECT dbo.fn_hadr_group_is_primary(@PrimaryAGAfinity)) = 0 BEGIN 
			PRINT 'Server is Not Primary.'
			RETURN 0;
		END
	END

	-- if we're not manually running this, make sure the server is the primary:
	IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters:
		IF ISNULL(@MailProfileName, '') = '' BEGIN
			RAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4);
			RETURN -2;
		END

		IF ISNULL(@OperatorName, '') = '' BEGIN
			RAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4);
			RETURN -3;
		END
	END;

	CREATE TABLE #IgnoredJobs (
		job_name sysname
	);

	-- single-line / in-line 'split' function:
	DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT ');

	INSERT INTO #IgnoredJobs 
	EXEC(@DeserializedJobs);
	
	DECLARE @LocalServerName sysname = @@SERVERNAME;
	DECLARE @RemoteServerName sysname; 
	SET @RemoteServerName = (SELECT TOP 1 name FROM PARTNER.master.sys.servers WHERE server_id = 0);

	----------------------------------------------------------------------------
	-- Start by checking all jobs that aren't 'mapped' to categories representing their HA name... 
	DECLARE @AvailabilityGroupDatabases TABLE ( 
		name sysname NOT NULL 
	); 

	INSERT INTO @AvailabilityGroupDatabases
	SELECT name FROM sys.availability_groups

	UNION 
	SELECT name FROM PARTNER.master.sys.availability_groups;
	
	CREATE TABLE #LocalJobs (
		job_id uniqueidentifier, 
		name sysname, 
		[enabled] tinyint, 
		[description] nvarchar(512), 
		start_step_id int, 
		owner_sid varbinary(85),
		notify_level_email int, 
		notify_email_operator_id int
	);

	CREATE TABLE #RemoteJobs (
		job_id uniqueidentifier, 
		name sysname, 
		[enabled] tinyint, 
		[description] nvarchar(512), 
		start_step_id int, 
		owner_sid varbinary(85),
		notify_level_email int, 
		notify_email_operator_id int
	);

	INSERT INTO #LocalJobs
	SELECT 
		sj.job_id, 
		sj.name, 
		sj.[enabled], 
		sj.[description], 
		sj.start_step_id,
		sj.owner_sid, 
		sj.notify_level_email, 
		sj.notify_email_operator_id
	FROM 
		msdb.dbo.sysjobs sj
		INNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
	WHERE
		-- categoryName of the Job in question != an AG Group Name
		UPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases);

	INSERT INTO #RemoteJobs
	SELECT 
		sj.job_id, 
		sj.name, 
		sj.[enabled], 
		sj.[description], 
		sj.start_step_id,
		sj.owner_sid, 
		sj.notify_level_email, 
		sj.notify_email_operator_id
	FROM 
		PARTNER.msdb.dbo.sysjobs sj
		INNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
	WHERE
		UPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases);

	CREATE TABLE #Divergence (
		name sysname, 
		diff_type sysname
	);

	INSERT INTO #Divergence
	SELECT 
		name,
		'Job Exists on ' + @LocalServerName + ' Only.'
	FROM 
		#LocalJobs 
	WHERE
		name NOT IN (SELECT name FROM #RemoteJobs);

	INSERT INTO #Divergence
	SELECT 
		name, 
		'Job Exists on ' + @RemoteServerName + ' Only.'
	FROM 
		#RemoteJobs
	WHERE
		name NOT IN (SELECT name FROM #LocalJobs);

	INSERT INTO #Divergence
	SELECT 
		lj.name, 
		'Job-Level Differences (enabled, start-step, notification, etc)'
	FROM 
		#LocalJobs lj
		INNER JOIN #RemoteJobs rj ON rj.name = lj.name
	WHERE
		lj.[enabled] != rj.[enabled]
		OR lj.start_step_id != rj.start_step_id
		OR lj.notify_email_operator_id != rj.notify_email_operator_id
		OR lj.notify_level_email != rj.notify_level_email;

	CREATE TABLE #LocalJobSteps (
		step_id int, 
		[checksum] int
	);

	CREATE TABLE #RemoteJobSteps (
		step_id int, 
		[checksum] int
	);

	CREATE TABLE #LocalJobSchedules (
		schedule_name sysname, 
		[checksum] int
	);

	CREATE TABLE #RemoteJobSchedules (
		schedule_name sysname, 
		[checksum] int
	);

	-- Now, (matching job by job) go through and check schedules and job steps: 
	DECLARE checker CURSOR FAST_FORWARD FOR
	SELECT 
		lj.job_id local_job_id, 
		rj.job_id remote_job_id, 
		lj.name 
	FROM 
		#LocalJobs lj
		INNER JOIN #RemoteJobs rj ON lj.name = rj.name;

	DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname;
	DECLARE @LocalCount int, @RemoteCount int;

	OPEN checker;
	FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;

	WHILE @@FETCH_STATUS = 0 BEGIN 
	
		-- check jobsteps first:
		DELETE FROM #LocalJobSteps;
		DELETE FROM #RemoteJobSteps;

		INSERT INTO #LocalJobSteps
		SELECT 
		step_id, 
		BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
		FROM msdb.dbo.sysjobsteps
		WHERE job_id = @LocalJobId;

		INSERT INTO #RemoteJobSteps
		SELECT 
		step_id, 
		BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
		FROM PARTNER.msdb.dbo.sysjobsteps
		WHERE job_id = @RemoteJobId;

		SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps;
		SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps;

		IF @LocalCount != @RemoteCount
			INSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers');
		ELSE BEGIN 
			INSERT INTO #Divergence
			SELECT 
				@JobName, 
				'Different Job Step Details between Servers'
			FROM 
				#LocalJobSteps ljs 
				INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_id
			WHERE	
				ljs.[checksum] != rjs.[checksum];
		END;

		-- Now Check Schedules:
		DELETE FROM #LocalJobSchedules;
		DELETE FROM #RemoteJobSchedules;

		INSERT INTO #LocalJobSchedules
		SELECT 
			ss.name,
			BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
			ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
			ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
		FROM 
			msdb.dbo.sysjobschedules sjs
			INNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
		WHERE
			sjs.job_id = @LocalJobId;


		INSERT INTO #RemoteJobSchedules
		SELECT 
			ss.name,
			BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
			ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
			ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
		FROM 
			PARTNER.msdb.dbo.sysjobschedules sjs
			INNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
		WHERE
			sjs.job_id = @RemoteJobId;

		SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules;
		SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules;

		IF @LocalCount != @RemoteCount
			INSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers');
		ELSE BEGIN 
			INSERT INTO #Divergence
			SELECT
				@JobName, 
				'Different Schedule Details between Servers.'
			FROM 
				#LocalJobSchedules ljs
				INNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_name
			WHERE 
				ljs.[checksum] != rjs.[checksum];

		END;

		FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;
	END;

	CLOSE checker;
	DEALLOCATE checker;


	IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN 

		DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems';
		DECLARE @crlf char(2) = CHAR(13) + CHAR(10);
		DECLARE @tab char(1) = CHAR(9);
		DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: '
		+ @crlf;

		SELECT 
			@message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlf
		FROM 
			#Divergence
		WHERE
			name NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '')
		ORDER BY 
			NAME;

		SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.'
			+ @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.'

		IF @ConsoleOnly = 1 BEGIN 
			-- just Print out details:
			PRINT 'SUBJECT: ' + @subject;
			PRINT 'BODY: ' + @crlf + @message;

		  END
		ELSE BEGIN
			-- send a message:
			EXEC msdb..sp_notify_operator 
				@profile_name = @MailProfileName, 
				@name = @OperatorName, 
				@subject = @subject,
				@body = @message;
		END;
	END;

	DROP TABLE #LocalJobs;
	DROP TABLE #RemoteJobs;
	DROP TABLE #Divergence;
	DROP TABLE #LocalJobSteps;
	DROP TABLE #RemoteJobSteps;
	DROP TABLE #LocalJobSchedules;
	DROP TABLE #RemoteJobSchedules;
	DROP TABLE #IgnoredJobs;

	RETURN 0;

Behind the scenes there’s a bit more going on. For example, there’s an @PrimaryAGAfinity parameter, which is used to specify "affinity" or correlation with/against a specified Availability Group being hosted on your servers. The idea is that if you specify this value (for example, “MyProductionDbs”) and SQL Server a.) detects that AG is defined on the host where this code is run and b.) that AG is currently not hosting Primary Replicas, then this checkup logic will exit because it will assume that we’ll be doing the checkups on the Server hosting the Primary Replica(s).

Likewise, there are parameters for a MailProfileName (or the name of the Mail Profile to use if there are synchronization issues detected), as well as the name of the Operator to notify.

Finally, there’s also an @IngoredJobs parameter, where you can comma-delimit a list of Job names that you wish to ignore (or not have checked for synchronization), and an @ConsoleOnly flag, which you can use for running/testing this sproc within SSMS--where it won’t bother sending an email alert as it would when scheduled.

Step 4: Deploy a Sproc to Check on AlwaysOn Availability Group Jobs

Another key thing that the sproc above does, though, is IGNORE any SQL Server Agent Job where the name of the Job Category assigned to that Job matches the name of an Availability Group on the host--in keeping with the convention outlined in Post #12, Issues with Enabling and Disabling Batch Jobs. As such, we’ll need additional, follow-up, logic that checks on AG-Level Jobs and their synchronization state.

Ultimately, this logic is pretty similar to the logic above, in that we’ll grab job details from one server and compare them against the details on another server. Only, in this case, we’ll only be grabbing details for jobs that "belong" to a specific SQL Server Agent Job Category (that is, to a given Availability Group), and we’ll, optionally, be looking to see if we need to "toggle" the Enabled/Disabled status for these jobs--based upon where they’re currently hosted and whether they’re on the same host as the Primary Replica.

Code is as follows:

USE master;
GO

IF OBJECT_ID('dbo.dba_SyncCheckupForAGLevelJobs','P') IS NOT NULL
	DROP PROC dbo.dba_SyncCheckupForAGLevelJobs;
GO

CREATE PROC dbo.dba_SyncCheckupForAGLevelJobs
	@AGName				sysname,				-- Required
	@JobsTableDBName	sysname,				-- Required (tells us which DB the list of enabled/disabled jobs is in).
	@MailProfileName	sysname = NULL,			-- optional - only needed if @ConsoleOnly = 0
	@OperatorName		sysname = NULL,			-- as per above
	@IgnoredJobs		nvarchar(MAX) = NULL,	-- jobs to ignore (in the category with name of MirroredDB)
	@ConsoleOnly		bit = 0 -- output details to console (1), or send alerts/output via email (0). 
AS	
	SET	NOCOUNT ON;

	IF ISNULL(@AGName, '') = '' BEGIN
		RAISERROR('Procedure expects parameter ''@AGName'' which was not provided.', 16, 4);
		RETURN -1;
	END

	IF ISNULL(@JobsTableDBName, '') = '' BEGIN 
		RAISERROR('Procedure expects parameter ''@JobsTableDBName'' which was not provided.', 16, 4);
		RETURN -2;
	END 

	IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters:

		IF ISNULL(@MailProfileName, '') = '' BEGIN
			RAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4);
			RETURN -5;
		END

		IF ISNULL(@OperatorName, '') = '' BEGIN
			RAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4);
			RETURN -6;
		END


	END;

	-- Start by running the Job that syncs/updates Job Enabled/Disabled Status first: 
	EXEC master.dbo.dba_EnsureAGLevelJobEnabledStatuses @AGName, @JobsTableDBName;

	-- Remaining checks/reporting only needs to be run from/against the Primary - so bail if that's not this server. 
	IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN
		PRINT 'Server is Not Primary';
		RETURN 0; 
	END

	CREATE TABLE #IgnoredJobs (
		job_name sysname
	);

	-- single-line / in-line 'split' function:
	DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT ');

	INSERT INTO #IgnoredJobs 
	EXEC(@DeserializedJobs);

	CREATE TABLE #LocalJobs (
		job_id uniqueidentifier, 
		name sysname, 
		[enabled] tinyint, 
		[description] nvarchar(512), 
		start_step_id int, 
		owner_sid varbinary(85),
		notify_level_email int, 
		notify_email_operator_id int
	);

	CREATE TABLE #RemoteJobs (
		job_id uniqueidentifier, 
		name sysname, 
		[enabled] tinyint, 
		[description] nvarchar(512), 
		start_step_id int, 
		owner_sid varbinary(85),
		notify_level_email int, 
		notify_email_operator_id int
	);

	INSERT INTO #LocalJobs
	SELECT 
		sj.job_id, 
		sj.name, 
		sj.[enabled], 
		sj.[description], 
		sj.start_step_id,
		sj.owner_sid, 
		sj.notify_level_email, 
		sj.notify_email_operator_id
	FROM 
		msdb.dbo.sysjobs sj
		INNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
	WHERE
		UPPER(sc.name) = UPPER(@AGName);

	INSERT INTO #RemoteJobs
	SELECT 
		sj.job_id, 
		sj.name, 
		sj.[enabled], 
		sj.[description], 
		sj.start_step_id,
		sj.owner_sid, 
		sj.notify_level_email, 
		sj.notify_email_operator_id
	FROM 
		PARTNER.msdb.dbo.sysjobs sj
		INNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_id
	WHERE
		UPPER(sc.name) = UPPER(@AGName);

	-- Now start comparing differences: 
	CREATE TABLE #Divergence (
		name sysname, 
		diff_type nvarchar(300)
	);

	INSERT INTO #Divergence
	SELECT 
		name,
		'Exists on Primary Server Only'
	FROM 
		#LocalJobs 
	WHERE
		name NOT IN (SELECT name FROM #RemoteJobs);

	INSERT INTO #Divergence
	SELECT 
		name, 
		'Exists on Secondary Server Only'
	FROM 
		#RemoteJobs
	WHERE
		name NOT IN (SELECT name FROM #LocalJobs);

	-- check meta-data properties:
	INSERT INTO #Divergence
	SELECT 
		lj.name, 
		'Job-Level Differences (owner, start-step, notification, etc)'
	FROM 
		#LocalJobs lj
		INNER JOIN #RemoteJobs rj ON rj.name = lj.name
	WHERE
		lj.owner_sid != rj.owner_sid
		OR lj.start_step_id != rj.start_step_id
		OR lj.notify_email_operator_id != rj.notify_email_operator_id
		OR lj.notify_level_email != rj.notify_level_email;


	-- For Job Enabled Status we have two checks: 
	--		A: No Jobs that have a Job.CategoryName = @AGName should be enabled on the SECONDARY server. 
	--		B: we can't infer that on the PRIMARY server. So, instead, if Job.CategoryName = @AGName and
	--			enabled/disabled != Enabled bit-flag from the <AGNAME>_JobEnabledStates table, then we need
	--			to raise an error because a job that should be enabled is NOT. (Or vice versa.)
	
	-- A) Check for jobs on the remote server:
	INSERT INTO #Divergence
	SELECT 
		name, 
		'Job Enabled on SECONDARY SERVER (Jobs should only be enabled on PRIMARY).'
	FROM 
		#RemoteJobs
	WHERE
		enabled = 1; 

	-- B) Check for jobs on the local server: 
	DECLARE @Jobs TABLE ( 
		JobName sysname NOT NULL, 
		[Enabled] bit NOT NULL 
	)

	DECLARE @sql nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];'

	INSERT INTO @Jobs
	EXEC(@sql); 

	-- Report on Jobs that are not documented: 
	INSERT INTO #Divergence
	SELECT
		lj.name, 
		'Job Enabled/Disabled Status is not currently defined. Please check job Enabled/Disabled status then run "EXEC master.dbo.dba_DocumentJobEnabledStatuses ''' + @AGName + ''';" to document Job Enabled/Disabled state.'
	FROM 
		#LocalJobs lj
	WHERE
		lj.name NOT IN (SELECT JobName FROM @Jobs);

	-- Report on Jobs that are Enabled but shouldn't be:
	INSERT INTO #Divergence
	SELECT 
		lj.name, 
		'Job is ENABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be DISABLED. (Update Table or Job Status to correct this issue.)'
	FROM 
		#LocalJobs lj 
		LEFT OUTER JOIN @Jobs j ON lj.name = j.JobName
	WHERE 
		ISNULL(j.[Enabled],0) = 0 AND lj.[enabled] = 1; -- job is listed/defaulted to disabled, but is currently enabled

	-- Report on Jobs that are Disabled but shouldn't be:
	INSERT INTO #Divergence
	SELECT 
		lj.name, 
		'Job is DISABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be _ENABLED_ (Running). (Update Table or Job Status to correct this issue.)'
	FROM 
		#LocalJobs lj 
		LEFT OUTER JOIN @Jobs j ON lj.name = j.JobName
	WHERE 
		ISNULL(j.[Enabled], 0) = 1 AND lj.[enabled] = 0;  -- job is listed/defaulted to enabled, but is currently disabled


	-- Now, (matching job by job) go through and check schedules and job steps: 
	CREATE TABLE #LocalJobSteps (
		step_id int, 
		[checksum] int
	);

	CREATE TABLE #RemoteJobSteps (
		step_id int, 
		[checksum] int
	);

	CREATE TABLE #LocalJobSchedules (
		schedule_name sysname, 
		[checksum] int
	);

	CREATE TABLE #RemoteJobSchedules (
		schedule_name sysname, 
		[checksum] int
	);

	DECLARE checker CURSOR FAST_FORWARD FOR
	SELECT 
		lj.job_id local_job_id, 
		rj.job_id remote_job_id, 
		lj.name 
	FROM 
		#LocalJobs lj
		INNER JOIN #RemoteJobs rj ON lj.name = rj.name;

	DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname;
	DECLARE @LocalCount int, @RemoteCount int;

	OPEN checker;
	FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;

	WHILE @@FETCH_STATUS = 0 BEGIN 
	
		-- check jobsteps first:
		DELETE FROM #LocalJobSteps;
		DELETE FROM #RemoteJobSteps;

		INSERT INTO #LocalJobSteps
		SELECT 
		step_id, 
		BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
		FROM msdb.dbo.sysjobsteps
		WHERE job_id = @LocalJobId;

		INSERT INTO #RemoteJobSteps
		SELECT 
		step_id, 
		BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail]
		FROM PARTNER.msdb.dbo.sysjobsteps
		WHERE job_id = @RemoteJobId;

		SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps;
		SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps;

		IF @LocalCount != @RemoteCount
			INSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers');
		ELSE BEGIN 
			INSERT INTO #Divergence
			SELECT 
				@JobName, 
				'Different Job Step Details between Servers'
			FROM 
				#LocalJobSteps ljs 
				INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_id
			WHERE	
				ljs.[checksum] != rjs.[checksum];
		END;

		-- Now Check Schedules:
		DELETE FROM #LocalJobSchedules;
		DELETE FROM #RemoteJobSchedules;

		INSERT INTO #LocalJobSchedules
		SELECT 
			ss.name,
			BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
			ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
			ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
		FROM 
			msdb.dbo.sysjobschedules sjs
			INNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
		WHERE
			sjs.job_id = @LocalJobId;


		INSERT INTO #RemoteJobSchedules
		SELECT 
			ss.name,
			BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type +
			ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor +
			ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details]
		FROM 
			PARTNER.msdb.dbo.sysjobschedules sjs
			INNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id
		WHERE
			sjs.job_id = @RemoteJobId;

		SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules;
		SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules;

		IF @LocalCount != @RemoteCount
			INSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers');
		ELSE BEGIN 
			INSERT INTO #Divergence
			SELECT
				@JobName, 
				'Different Schedule Details between servers.'
			FROM 
				#LocalJobSchedules ljs
				INNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_name
			WHERE 
				ljs.[checksum] != rjs.[checksum];

		END;

		FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName;
	END;

	CLOSE checker;
	DEALLOCATE checker;

	IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN 

		DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems';
		DECLARE @crlf char(2) = CHAR(13) + CHAR(10);
		DECLARE @tab char(1) = CHAR(9);
		DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: '
		+ @crlf;

		SELECT 
			@message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlf
		FROM 
			#Divergence
		WHERE
			name NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '')
		ORDER BY 
			NAME;

		SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.'
			+ @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.'

		IF @ConsoleOnly = 1 BEGIN 
			-- just Print out details:
			PRINT 'SUBJECT: ' + @subject;
			PRINT 'BODY: ' + @crlf + @message;

		  END
		ELSE BEGIN
			-- send a message:
			EXEC msdb..sp_notify_operator 
				@profile_name = @MailProfileName, 
				@name = @OperatorName, 
				@subject = @subject,
				@body = @message;
		END;
	END;

	DROP TABLE #LocalJobs;
	DROP TABLE #RemoteJobs;
	DROP TABLE #Divergence;
	DROP TABLE #LocalJobSteps;
	DROP TABLE #RemoteJobSteps;
	DROP TABLE #LocalJobSchedules;
	DROP TABLE #RemoteJobSchedules;
	DROP TABLE #IgnoredJobs;

	RETURN 0;
GO

If you decided to implement the Enable/Disable approach to managing ‘Batch Jobs’ (as outlined in Post #12) the code above will work as-is and will try to Enable/Disable jobs on your servers as needed--by making a call to the following sproc (which makes use of the Jobs State Table discussed and outlined in Post #13):

USE master;
GO

IF OBJECT_ID('dbo.dba_EnsureAGLevelJobEnabledStatuses','P') IS NOT NULL
	DROP PROC dbo.dba_EnsureAGLevelJobEnabledStatuses
GO

CREATE PROC dbo.dba_EnsureAGLevelJobEnabledStatuses
	@AGName				sysname, -- name of the AG to failover. 
	@JobsTableDBName	sysname -- name of the db with the jobs-enabled-details table
AS
	SET NOCOUNT ON; 

	IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN
		
		-- If we're not on the PRIMARY server, then just make sure all jobs with categoryName = @AGName are disabled
		DECLARE disabler CURSOR FAST_FORWARD FOR
		SELECT 
			j.job_id,
			j.name 
		FROM 
			msdb.dbo.sysjobs  j
			INNER JOIN msdb.dbo.syscategories c ON c.category_id = j.category_id
		WHERE
			j.[enabled] = 1 AND
			c.name = @AGName;

		DECLARE @JobId uniqueidentifier;
		DECLARE @JobName sysname; 

		OPEN disabler; 
		FETCH NEXT FROM disabler INTO @JobId, @JobName;
		WHILE @@FETCH_STATUS = 0 BEGIN 

			EXEC msdb.dbo.sp_update_job
				@job_id = @JobId, 
				@enabled = 0;

			PRINT 'Disabled Job: [' + @JobName + '] ON Non-Primary Server.';

			FETCH NEXT FROM disabler INTO @JobId, @JobName;
		END;

		CLOSE disabler; 
		DEALLOCATE disabler;

	  END ;

	
	ELSE BEGIN 
		
		-- otherwise, if we ARE on the primary, then we should ENABLE jobs that need to be enabled. 
		DECLARE @Error sysname;

		DECLARE @Jobs TABLE ( 
			JobName sysname NOT NULL, 
			[Enabled] bit NOT NULL 
		);

		DECLARE @sql nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];'

		INSERT INTO @Jobs
		EXEC(@sql); 

		DECLARE enabler CURSOR FAST_FORWARD FOR 
		SELECT 
			JobName 
		FROM 
			@Jobs 
		WHERE 
			[Enabled] = 1; 

		OPEN enabler; 
		FETCH NEXT FROM enabler INTO @JobName; 

		WHILE @@FETCH_STATUS = 0 BEGIN 
		
			SELECT @JobId = job_id FROM msdb.dbo.sysjobs WHERE name = @JobName;

			IF @JobId IS NULL BEGIN 
				SET @Error = 'Unable to Locate (and set ENABLED) on Job: ' + @JobName + '.';
				THROW 50001, @Error, 1;
				RETURN -1; -- we're done. (should we send out an email alert? probably)
			END
			
			EXEC msdb.dbo.sp_update_job 
				@job_id = @JobId, 
				@enabled = 1; -- enable the job here... 

			FETCH NEXT FROM enabler INTO @JobName; 

		END; 

		CLOSE enabler; 
		DEALLOCATE enabler;

	END;

	RETURN 0; 
GO

However, if you choose to go with the IF/ELSE logic embedded in your Batch Jobs, then you’ll want to disable (that is, comment-out) the code in the ‘parent’ sproc that attempts these operations. Likewise, it should be called out that all of the code in this post is for a scenario where you’ve just got TWO servers in your Availability Group (hence the notion of calls to a ‘Partner’ Linked Server, as per post #15). If you’ve got more than two servers in your topology, the basic logic outlined in these posts will work--but you’ll need to modify it accordingly (which we’ll look at a tiny bit in a future post).

Step 5: Create a SQL Server Agent Job to Execute Regular Checkups

With code in place, you’re now ready to set up a Job that’ll execute regular synchronization checks. Basic logic for setting up this job is that you’ll want to set up a recurring schedule, then add a single job step to run ‘Server Level’ checks, and then add in an additional Job Step for each AlwaysOn Availability Group that you want to check up on. So, if you’ve got one Availability Group on your servers, you’d end up with a Job with two Job Steps; whereas if you have three AGs, you’ll end up with a Job with four Job Steps (one for Server-Level checkups and a distinct job step, or call to dba_SyncCheckupForAGLevelJobs for each of your Availability Groups).

In my next post, we’ll take a more concrete look at a sample/example Synchronization Job--and discuss some options and best practices for scheduling and review how these checks will "behave" in the "wild."

Comments

Plain text