In my previous post, I provided a high-level outline of the core logic (and rationale behind that logic) that would be needed to set up regular synchronization checks on SQL Server Agent Jobs for servers where AlwaysOn Availability Groups have been deployed. In this post, I’ll walk through the steps--and the code--needed to setup those checks.
Step 1: Start with Linked Servers
In order for the logic outlined in this post to work, you’ll need to set up Linked Server definitions between all of the servers hosting your AlwaysOn Availability Groups. In Post #15 I outlined some key concerns around Setting Up Linked Servers, which you’ll have to address before any of the code or logic below will work.
Step 2: Define UDF(s) to Determine Primary Replica Host
During checkups it will be important to know which server (or instance) is currently hosting the Primary Replica(s) for a given Availability Group.
As per post #11, A Fork in the Road, if you’re using the option/technique where you’ll be enabling/disabling jobs depending upon whether they’re on the same server as the Primary Replica(s), this logic will be indispensable. On the other hand, if you’re just using if/else logic within your job steps, you’ll still want to know which server is hosting the Primary Replica(s) in your target Availability Group when running synchronization checks so that you can simplify checks and run them only on/from the primary, as outlined below.
Technically speaking, there are two ways to check for "primacy"of your Availability Groups: by database name (for example, “Is the suchAndSuch database hosted on this server currently the Primary?”) or by the name of the Availability Group (“Is the blahBlah Availability Group currently hosting Primary Replicas on this server?”). Scripts for both techniques are listed below.
USE master; GO IF OBJECT_ID('dbo.fn_hadr_database_is_primary','FN') IS NOT NULL DROP FUNCTION dbo.fn_hadr_database_is_primary; GO CREATE FUNCTION dbo.fn_hadr_database_is_primary (@DBName sysname) RETURNS bit AS BEGIN DECLARE @description sysname; SELECT @description = hars.role_desc FROM sys.databases d INNER JOIN sys.dm_hadr_availability_replica_states hars ON d.replica_id = hars.replica_id WHERE database_id = DB_ID(@DBName); IF @description = 'PRIMARY' RETURN 1; RETURN 0; END; GO USE master; GO IF OBJECT_ID('dbo.fn_hadr_group_is_primary','FN') IS NOT NULL DROP FUNCTION dbo.fn_hadr_group_is_primary; GO CREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname) RETURNS bit AS BEGIN DECLARE @PrimaryReplica sysname; SELECT @PrimaryReplica = hags.primary_replica FROM sys.dm_hadr_availability_group_states hags INNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_id WHERE ag.name = @AGName; IF UPPER(@PrimaryReplica) = UPPER(@@SERVERNAME) RETURN 1; -- primary RETURN 0; -- not primary END; GO
The naming "convention" around both of these UDFs is patterned after that of sys.fn_hadr_backup_is_preferred_replica. As with that UDF, if you specify the name of the @DBName or @AGName parameters above incorrectly, you’ll get a FALSE result instead of an error (which is by design).
Step 3: Deploy a Sproc to Check Sever-Level Jobs
With Linked Servers in place, it’s now time to check up on Server-Level Jobs--or SQL Server Agent Jobs that operate at the Server Level or which target "server level" details (such as checking available disk space, cycling the error logs or truncating msdb history). This logic will also (likely) be used to checkup on backup routines.
The idea behind this code is pretty simple: It just requests details about SQL Server Agent Jobs on one server, grabs details about Jobs on another server, and then compares relevant bits of information to see if there are any differences. Or, at least, that’s what it’s doing at a high level.
USE master; GO IF OBJECT_ID('dbo.dba_SyncCheckupForInstanceLevelJobs','P') IS NOT NULL DROP PROC dbo.dba_SyncCheckupForInstanceLevelJobs GO CREATE PROC dbo.dba_SyncCheckupForInstanceLevelJobs @MailProfileName sysname = NULL, -- e.g., 'General' @OperatorName sysname = NULL, -- e.g., 'Alerts', @PrimaryAGAfinity sysname = NULL, -- see comments above. @IgnoredJobs nvarchar(MAX) = '', @ConsoleOnly bit = 0 -- output only to console - don't email alerts (for debugging/manual execution, etc.) AS SET NOCOUNT ON; IF @PrimaryAGAfinity IS NOT NULL BEGIN IF (SELECT dbo.fn_hadr_group_is_primary(@PrimaryAGAfinity)) = 0 BEGIN PRINT 'Server is Not Primary.' RETURN 0; END END -- if we're not manually running this, make sure the server is the primary: IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters: IF ISNULL(@MailProfileName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4); RETURN -2; END IF ISNULL(@OperatorName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4); RETURN -3; END END; CREATE TABLE #IgnoredJobs ( job_name sysname ); -- single-line / in-line 'split' function: DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT '); INSERT INTO #IgnoredJobs EXEC(@DeserializedJobs); DECLARE @LocalServerName sysname = @@SERVERNAME; DECLARE @RemoteServerName sysname; SET @RemoteServerName = (SELECT TOP 1 name FROM PARTNER.master.sys.servers WHERE server_id = 0); ---------------------------------------------------------------------------- -- Start by checking all jobs that aren't 'mapped' to categories representing their HA name... DECLARE @AvailabilityGroupDatabases TABLE ( name sysname NOT NULL ); INSERT INTO @AvailabilityGroupDatabases SELECT name FROM sys.availability_groups UNION SELECT name FROM PARTNER.master.sys.availability_groups; CREATE TABLE #LocalJobs ( job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85), notify_level_email int, notify_email_operator_id int ); CREATE TABLE #RemoteJobs ( job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85), notify_level_email int, notify_email_operator_id int ); INSERT INTO #LocalJobs SELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id, sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_id FROM msdb.dbo.sysjobs sj INNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_id WHERE -- categoryName of the Job in question != an AG Group Name UPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases); INSERT INTO #RemoteJobs SELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id, sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_id FROM PARTNER.msdb.dbo.sysjobs sj INNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_id WHERE UPPER(sc.name) NOT IN (SELECT UPPER(name) FROM @AvailabilityGroupDatabases); CREATE TABLE #Divergence ( name sysname, diff_type sysname ); INSERT INTO #Divergence SELECT name, 'Job Exists on ' + @LocalServerName + ' Only.' FROM #LocalJobs WHERE name NOT IN (SELECT name FROM #RemoteJobs); INSERT INTO #Divergence SELECT name, 'Job Exists on ' + @RemoteServerName + ' Only.' FROM #RemoteJobs WHERE name NOT IN (SELECT name FROM #LocalJobs); INSERT INTO #Divergence SELECT lj.name, 'Job-Level Differences (enabled, start-step, notification, etc)' FROM #LocalJobs lj INNER JOIN #RemoteJobs rj ON rj.name = lj.name WHERE lj.[enabled] != rj.[enabled] OR lj.start_step_id != rj.start_step_id OR lj.notify_email_operator_id != rj.notify_email_operator_id OR lj.notify_level_email != rj.notify_level_email; CREATE TABLE #LocalJobSteps ( step_id int, [checksum] int ); CREATE TABLE #RemoteJobSteps ( step_id int, [checksum] int ); CREATE TABLE #LocalJobSchedules ( schedule_name sysname, [checksum] int ); CREATE TABLE #RemoteJobSchedules ( schedule_name sysname, [checksum] int ); -- Now, (matching job by job) go through and check schedules and job steps: DECLARE checker CURSOR FAST_FORWARD FOR SELECT lj.job_id local_job_id, rj.job_id remote_job_id, lj.name FROM #LocalJobs lj INNER JOIN #RemoteJobs rj ON lj.name = rj.name; DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname; DECLARE @LocalCount int, @RemoteCount int; OPEN checker; FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName; WHILE @@FETCH_STATUS = 0 BEGIN -- check jobsteps first: DELETE FROM #LocalJobSteps; DELETE FROM #RemoteJobSteps; INSERT INTO #LocalJobSteps SELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail] FROM msdb.dbo.sysjobsteps WHERE job_id = @LocalJobId; INSERT INTO #RemoteJobSteps SELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail] FROM PARTNER.msdb.dbo.sysjobsteps WHERE job_id = @RemoteJobId; SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps; SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps; IF @LocalCount != @RemoteCount INSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers'); ELSE BEGIN INSERT INTO #Divergence SELECT @JobName, 'Different Job Step Details between Servers' FROM #LocalJobSteps ljs INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_id WHERE ljs.[checksum] != rjs.[checksum]; END; -- Now Check Schedules: DELETE FROM #LocalJobSchedules; DELETE FROM #RemoteJobSchedules; INSERT INTO #LocalJobSchedules SELECT ss.name, BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type + ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor + ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details] FROM msdb.dbo.sysjobschedules sjs INNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id WHERE sjs.job_id = @LocalJobId; INSERT INTO #RemoteJobSchedules SELECT ss.name, BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type + ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor + ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details] FROM PARTNER.msdb.dbo.sysjobschedules sjs INNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id WHERE sjs.job_id = @RemoteJobId; SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules; SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules; IF @LocalCount != @RemoteCount INSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers'); ELSE BEGIN INSERT INTO #Divergence SELECT @JobName, 'Different Schedule Details between Servers.' FROM #LocalJobSchedules ljs INNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_name WHERE ljs.[checksum] != rjs.[checksum]; END; FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName; END; CLOSE checker; DEALLOCATE checker; IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems'; DECLARE @crlf char(2) = CHAR(13) + CHAR(10); DECLARE @tab char(1) = CHAR(9); DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: ' + @crlf; SELECT @message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlf FROM #Divergence WHERE name NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '') ORDER BY NAME; SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.' + @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.' IF @ConsoleOnly = 1 BEGIN -- just Print out details: PRINT 'SUBJECT: ' + @subject; PRINT 'BODY: ' + @crlf + @message; END ELSE BEGIN -- send a message: EXEC msdb..sp_notify_operator @profile_name = @MailProfileName, @name = @OperatorName, @subject = @subject, @body = @message; END; END; DROP TABLE #LocalJobs; DROP TABLE #RemoteJobs; DROP TABLE #Divergence; DROP TABLE #LocalJobSteps; DROP TABLE #RemoteJobSteps; DROP TABLE #LocalJobSchedules; DROP TABLE #RemoteJobSchedules; DROP TABLE #IgnoredJobs; RETURN 0;
Behind the scenes there’s a bit more going on. For example, there’s an @PrimaryAGAfinity parameter, which is used to specify "affinity" or correlation with/against a specified Availability Group being hosted on your servers. The idea is that if you specify this value (for example, “MyProductionDbs”) and SQL Server a.) detects that AG is defined on the host where this code is run and b.) that AG is currently not hosting Primary Replicas, then this checkup logic will exit because it will assume that we’ll be doing the checkups on the Server hosting the Primary Replica(s).
Likewise, there are parameters for a MailProfileName (or the name of the Mail Profile to use if there are synchronization issues detected), as well as the name of the Operator to notify.
Finally, there’s also an @IngoredJobs parameter, where you can comma-delimit a list of Job names that you wish to ignore (or not have checked for synchronization), and an @ConsoleOnly flag, which you can use for running/testing this sproc within SSMS--where it won’t bother sending an email alert as it would when scheduled.
Step 4: Deploy a Sproc to Check on AlwaysOn Availability Group Jobs
Another key thing that the sproc above does, though, is IGNORE any SQL Server Agent Job where the name of the Job Category assigned to that Job matches the name of an Availability Group on the host--in keeping with the convention outlined in Post #12, Issues with Enabling and Disabling Batch Jobs. As such, we’ll need additional, follow-up, logic that checks on AG-Level Jobs and their synchronization state.
Ultimately, this logic is pretty similar to the logic above, in that we’ll grab job details from one server and compare them against the details on another server. Only, in this case, we’ll only be grabbing details for jobs that "belong" to a specific SQL Server Agent Job Category (that is, to a given Availability Group), and we’ll, optionally, be looking to see if we need to "toggle" the Enabled/Disabled status for these jobs--based upon where they’re currently hosted and whether they’re on the same host as the Primary Replica.
Code is as follows:
USE master; GO IF OBJECT_ID('dbo.dba_SyncCheckupForAGLevelJobs','P') IS NOT NULL DROP PROC dbo.dba_SyncCheckupForAGLevelJobs; GO CREATE PROC dbo.dba_SyncCheckupForAGLevelJobs @AGName sysname, -- Required @JobsTableDBName sysname, -- Required (tells us which DB the list of enabled/disabled jobs is in). @MailProfileName sysname = NULL, -- optional - only needed if @ConsoleOnly = 0 @OperatorName sysname = NULL, -- as per above @IgnoredJobs nvarchar(MAX) = NULL, -- jobs to ignore (in the category with name of MirroredDB) @ConsoleOnly bit = 0 -- output details to console (1), or send alerts/output via email (0). AS SET NOCOUNT ON; IF ISNULL(@AGName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@AGName'' which was not provided.', 16, 4); RETURN -1; END IF ISNULL(@JobsTableDBName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@JobsTableDBName'' which was not provided.', 16, 4); RETURN -2; END IF @ConsoleOnly = 0 BEGIN -- if we're not running a 'manual' execution - make sure we have all parameters: IF ISNULL(@MailProfileName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@MailProfileName'' which was not provided.', 16, 4); RETURN -5; END IF ISNULL(@OperatorName, '') = '' BEGIN RAISERROR('Procedure expects parameter ''@OperatorName'' which was not provided.', 16, 4); RETURN -6; END END; -- Start by running the Job that syncs/updates Job Enabled/Disabled Status first: EXEC master.dbo.dba_EnsureAGLevelJobEnabledStatuses @AGName, @JobsTableDBName; -- Remaining checks/reporting only needs to be run from/against the Primary - so bail if that's not this server. IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN PRINT 'Server is Not Primary'; RETURN 0; END CREATE TABLE #IgnoredJobs ( job_name sysname ); -- single-line / in-line 'split' function: DECLARE @DeserializedJobs nvarchar(MAX) = N'SELECT ' + REPLACE(REPLACE(REPLACE(N'''{0}''','{0}',@IgnoredJobs), ',', ''','''), ',', ' UNION SELECT '); INSERT INTO #IgnoredJobs EXEC(@DeserializedJobs); CREATE TABLE #LocalJobs ( job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85), notify_level_email int, notify_email_operator_id int ); CREATE TABLE #RemoteJobs ( job_id uniqueidentifier, name sysname, [enabled] tinyint, [description] nvarchar(512), start_step_id int, owner_sid varbinary(85), notify_level_email int, notify_email_operator_id int ); INSERT INTO #LocalJobs SELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id, sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_id FROM msdb.dbo.sysjobs sj INNER JOIN msdb.dbo.syscategories sc ON sj.category_id = sc.category_id WHERE UPPER(sc.name) = UPPER(@AGName); INSERT INTO #RemoteJobs SELECT sj.job_id, sj.name, sj.[enabled], sj.[description], sj.start_step_id, sj.owner_sid, sj.notify_level_email, sj.notify_email_operator_id FROM PARTNER.msdb.dbo.sysjobs sj INNER JOIN PARTNER.msdb.dbo.syscategories sc ON sj.category_id = sc.category_id WHERE UPPER(sc.name) = UPPER(@AGName); -- Now start comparing differences: CREATE TABLE #Divergence ( name sysname, diff_type nvarchar(300) ); INSERT INTO #Divergence SELECT name, 'Exists on Primary Server Only' FROM #LocalJobs WHERE name NOT IN (SELECT name FROM #RemoteJobs); INSERT INTO #Divergence SELECT name, 'Exists on Secondary Server Only' FROM #RemoteJobs WHERE name NOT IN (SELECT name FROM #LocalJobs); -- check meta-data properties: INSERT INTO #Divergence SELECT lj.name, 'Job-Level Differences (owner, start-step, notification, etc)' FROM #LocalJobs lj INNER JOIN #RemoteJobs rj ON rj.name = lj.name WHERE lj.owner_sid != rj.owner_sid OR lj.start_step_id != rj.start_step_id OR lj.notify_email_operator_id != rj.notify_email_operator_id OR lj.notify_level_email != rj.notify_level_email; -- For Job Enabled Status we have two checks: -- A: No Jobs that have a Job.CategoryName = @AGName should be enabled on the SECONDARY server. -- B: we can't infer that on the PRIMARY server. So, instead, if Job.CategoryName = @AGName and -- enabled/disabled != Enabled bit-flag from the <AGNAME>_JobEnabledStates table, then we need -- to raise an error because a job that should be enabled is NOT. (Or vice versa.) -- A) Check for jobs on the remote server: INSERT INTO #Divergence SELECT name, 'Job Enabled on SECONDARY SERVER (Jobs should only be enabled on PRIMARY).' FROM #RemoteJobs WHERE enabled = 1; -- B) Check for jobs on the local server: DECLARE @Jobs TABLE ( JobName sysname NOT NULL, [Enabled] bit NOT NULL ) DECLARE @sql nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];' INSERT INTO @Jobs EXEC(@sql); -- Report on Jobs that are not documented: INSERT INTO #Divergence SELECT lj.name, 'Job Enabled/Disabled Status is not currently defined. Please check job Enabled/Disabled status then run "EXEC master.dbo.dba_DocumentJobEnabledStatuses ''' + @AGName + ''';" to document Job Enabled/Disabled state.' FROM #LocalJobs lj WHERE lj.name NOT IN (SELECT JobName FROM @Jobs); -- Report on Jobs that are Enabled but shouldn't be: INSERT INTO #Divergence SELECT lj.name, 'Job is ENABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be DISABLED. (Update Table or Job Status to correct this issue.)' FROM #LocalJobs lj LEFT OUTER JOIN @Jobs j ON lj.name = j.JobName WHERE ISNULL(j.[Enabled],0) = 0 AND lj.[enabled] = 1; -- job is listed/defaulted to disabled, but is currently enabled -- Report on Jobs that are Disabled but shouldn't be: INSERT INTO #Divergence SELECT lj.name, 'Job is DISABLED on Primary Replica Server but ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates] specifies that Job should be _ENABLED_ (Running). (Update Table or Job Status to correct this issue.)' FROM #LocalJobs lj LEFT OUTER JOIN @Jobs j ON lj.name = j.JobName WHERE ISNULL(j.[Enabled], 0) = 1 AND lj.[enabled] = 0; -- job is listed/defaulted to enabled, but is currently disabled -- Now, (matching job by job) go through and check schedules and job steps: CREATE TABLE #LocalJobSteps ( step_id int, [checksum] int ); CREATE TABLE #RemoteJobSteps ( step_id int, [checksum] int ); CREATE TABLE #LocalJobSchedules ( schedule_name sysname, [checksum] int ); CREATE TABLE #RemoteJobSchedules ( schedule_name sysname, [checksum] int ); DECLARE checker CURSOR FAST_FORWARD FOR SELECT lj.job_id local_job_id, rj.job_id remote_job_id, lj.name FROM #LocalJobs lj INNER JOIN #RemoteJobs rj ON lj.name = rj.name; DECLARE @LocalJobId uniqueidentifier, @RemoteJobId uniqueidentifier, @JobName sysname; DECLARE @LocalCount int, @RemoteCount int; OPEN checker; FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName; WHILE @@FETCH_STATUS = 0 BEGIN -- check jobsteps first: DELETE FROM #LocalJobSteps; DELETE FROM #RemoteJobSteps; INSERT INTO #LocalJobSteps SELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail] FROM msdb.dbo.sysjobsteps WHERE job_id = @LocalJobId; INSERT INTO #RemoteJobSteps SELECT step_id, BINARY_CHECKSUM(step_name + subsystem + command + STR(on_success_action) + STR(on_fail_action) + database_name) [detail] FROM PARTNER.msdb.dbo.sysjobsteps WHERE job_id = @RemoteJobId; SELECT @LocalCount = COUNT(*) FROM #LocalJobSteps; SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSteps; IF @LocalCount != @RemoteCount INSERT INTO #Divergence VALUES (@JobName, 'Different Job Step Counts between Servers'); ELSE BEGIN INSERT INTO #Divergence SELECT @JobName, 'Different Job Step Details between Servers' FROM #LocalJobSteps ljs INNER JOIN #RemoteJobSteps rjs ON rjs.step_id = ljs.step_id WHERE ljs.[checksum] != rjs.[checksum]; END; -- Now Check Schedules: DELETE FROM #LocalJobSchedules; DELETE FROM #RemoteJobSchedules; INSERT INTO #LocalJobSchedules SELECT ss.name, BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type + ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor + ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details] FROM msdb.dbo.sysjobschedules sjs INNER JOIN msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id WHERE sjs.job_id = @LocalJobId; INSERT INTO #RemoteJobSchedules SELECT ss.name, BINARY_CHECKSUM(ss.[enabled] + ss.freq_type + ss.freq_interval + ss.freq_subday_type + ss.freq_subday_interval + ss.freq_relative_interval + ss.freq_recurrence_factor + ss.active_start_date + ss.active_end_date + ss.active_start_date + ss.active_end_time) [details] FROM PARTNER.msdb.dbo.sysjobschedules sjs INNER JOIN PARTNER.msdb.dbo.sysschedules ss ON ss.schedule_id = sjs.schedule_id WHERE sjs.job_id = @RemoteJobId; SELECT @LocalCount = COUNT(*) FROM #LocalJobSchedules; SELECT @RemoteCount = COUNT(*) FROM #RemoteJobSchedules; IF @LocalCount != @RemoteCount INSERT INTO #Divergence VALUES (@JobName, 'Different Job Schedule Counts between Servers'); ELSE BEGIN INSERT INTO #Divergence SELECT @JobName, 'Different Schedule Details between servers.' FROM #LocalJobSchedules ljs INNER JOIN #RemoteJobSchedules rjs ON rjs.schedule_name = ljs.schedule_name WHERE ljs.[checksum] != rjs.[checksum]; END; FETCH NEXT FROM checker INTO @LocalJobId, @RemoteJobId, @JobName; END; CLOSE checker; DEALLOCATE checker; IF(SELECT COUNT(*) FROM #Divergence WHERE name NOT IN(SELECT job_name FROM #IgnoredJobs)) > 0 BEGIN DECLARE @subject sysname = 'SQL Server Agent Job Synchronization Problems'; DECLARE @crlf char(2) = CHAR(13) + CHAR(10); DECLARE @tab char(1) = CHAR(9); DECLARE @message nvarchar(MAX) = 'Problems detected with the following SQL Server Agent Jobs: ' + @crlf; SELECT @message = @message + @tab + name + ': ' + @crlf + @tab + @tab + diff_type + @crlf FROM #Divergence WHERE name NOT IN (SELECT job_name FROM #IgnoredJobs WHERE job_name != '') ORDER BY NAME; SELECT @message += @crlf + @crlf + 'Jobs can be synchronized by scripting them on the Primary and running scripts on the Seconary.' + @crlf + @tab + 'To Script Multiple Jobs at once, SSMS > SQL Server Agent Jobs > F7 -> then shift/ctrl + click to select multiple jobs simultaneously.' IF @ConsoleOnly = 1 BEGIN -- just Print out details: PRINT 'SUBJECT: ' + @subject; PRINT 'BODY: ' + @crlf + @message; END ELSE BEGIN -- send a message: EXEC msdb..sp_notify_operator @profile_name = @MailProfileName, @name = @OperatorName, @subject = @subject, @body = @message; END; END; DROP TABLE #LocalJobs; DROP TABLE #RemoteJobs; DROP TABLE #Divergence; DROP TABLE #LocalJobSteps; DROP TABLE #RemoteJobSteps; DROP TABLE #LocalJobSchedules; DROP TABLE #RemoteJobSchedules; DROP TABLE #IgnoredJobs; RETURN 0; GO
If you decided to implement the Enable/Disable approach to managing ‘Batch Jobs’ (as outlined in Post #12) the code above will work as-is and will try to Enable/Disable jobs on your servers as needed--by making a call to the following sproc (which makes use of the Jobs State Table discussed and outlined in Post #13):
USE master; GO IF OBJECT_ID('dbo.dba_EnsureAGLevelJobEnabledStatuses','P') IS NOT NULL DROP PROC dbo.dba_EnsureAGLevelJobEnabledStatuses GO CREATE PROC dbo.dba_EnsureAGLevelJobEnabledStatuses @AGName sysname, -- name of the AG to failover. @JobsTableDBName sysname -- name of the db with the jobs-enabled-details table AS SET NOCOUNT ON; IF(SELECT master.dbo.fn_hadr_group_is_primary(@AGName)) = 0 BEGIN -- If we're not on the PRIMARY server, then just make sure all jobs with categoryName = @AGName are disabled DECLARE disabler CURSOR FAST_FORWARD FOR SELECT j.job_id, j.name FROM msdb.dbo.sysjobs j INNER JOIN msdb.dbo.syscategories c ON c.category_id = j.category_id WHERE j.[enabled] = 1 AND c.name = @AGName; DECLARE @JobId uniqueidentifier; DECLARE @JobName sysname; OPEN disabler; FETCH NEXT FROM disabler INTO @JobId, @JobName; WHILE @@FETCH_STATUS = 0 BEGIN EXEC msdb.dbo.sp_update_job @job_id = @JobId, @enabled = 0; PRINT 'Disabled Job: [' + @JobName + '] ON Non-Primary Server.'; FETCH NEXT FROM disabler INTO @JobId, @JobName; END; CLOSE disabler; DEALLOCATE disabler; END ; ELSE BEGIN -- otherwise, if we ARE on the primary, then we should ENABLE jobs that need to be enabled. DECLARE @Error sysname; DECLARE @Jobs TABLE ( JobName sysname NOT NULL, [Enabled] bit NOT NULL ); DECLARE @sql nvarchar(MAX) = N'SELECT JobName, [Enabled] FROM ' + @JobsTableDBName + '.dbo.[' + @AGName + '_JobEnabledStates];' INSERT INTO @Jobs EXEC(@sql); DECLARE enabler CURSOR FAST_FORWARD FOR SELECT JobName FROM @Jobs WHERE [Enabled] = 1; OPEN enabler; FETCH NEXT FROM enabler INTO @JobName; WHILE @@FETCH_STATUS = 0 BEGIN SELECT @JobId = job_id FROM msdb.dbo.sysjobs WHERE name = @JobName; IF @JobId IS NULL BEGIN SET @Error = 'Unable to Locate (and set ENABLED) on Job: ' + @JobName + '.'; THROW 50001, @Error, 1; RETURN -1; -- we're done. (should we send out an email alert? probably) END EXEC msdb.dbo.sp_update_job @job_id = @JobId, @enabled = 1; -- enable the job here... FETCH NEXT FROM enabler INTO @JobName; END; CLOSE enabler; DEALLOCATE enabler; END; RETURN 0; GO
However, if you choose to go with the IF/ELSE logic embedded in your Batch Jobs, then you’ll want to disable (that is, comment-out) the code in the ‘parent’ sproc that attempts these operations. Likewise, it should be called out that all of the code in this post is for a scenario where you’ve just got TWO servers in your Availability Group (hence the notion of calls to a ‘Partner’ Linked Server, as per post #15). If you’ve got more than two servers in your topology, the basic logic outlined in these posts will work--but you’ll need to modify it accordingly (which we’ll look at a tiny bit in a future post).
Step 5: Create a SQL Server Agent Job to Execute Regular Checkups
With code in place, you’re now ready to set up a Job that’ll execute regular synchronization checks. Basic logic for setting up this job is that you’ll want to set up a recurring schedule, then add a single job step to run ‘Server Level’ checks, and then add in an additional Job Step for each AlwaysOn Availability Group that you want to check up on. So, if you’ve got one Availability Group on your servers, you’d end up with a Job with two Job Steps; whereas if you have three AGs, you’ll end up with a Job with four Job Steps (one for Server-Level checkups and a distinct job step, or call to dba_SyncCheckupForAGLevelJobs for each of your Availability Groups).
In my next post, we’ll take a more concrete look at a sample/example Synchronization Job--and discuss some options and best practices for scheduling and review how these checks will "behave" in the "wild."