Plugging the Gaps in SQL Server Job Tracking

SQL Server jobs are useful for automating and scheduling the execution of database programs. The SQL Server Agent service is responsible for tracking job schedules and launching SQL Server jobs on time. Yet sometimes jobs might not start according to schedule. Compounding this problem is that SQL Server doesn't report dates and timestamps for scheduled jobs that didn’t occur, so it's hard to find this information. Another problem occurs when a SQL Server job has multiple schedules that reference dates and times that are the same or close to one another. This situation could result in a job being called multiple times within seconds, which could cause execution errors.

To solve these problems, I’ve written two T-SQL scripts, JobScript_FindMissedJobs.sql and JobScript_ FindConflictingJobSchedules.sql, which find missed jobs and identify conflicting schedules. (To download these scripts, click the Download the Code link at the top of this page.) Let’s look at the job-scheduling problems more closely and explore how the scripts work.

Job-Scheduling Issues

One obvious cause of a job’s failing to start is when SQL Server Agent isn’t running. A more interesting example can occur with long-running jobs. Say that a job is scheduled to run at datetime1, then again at datetime2. If the job starts running at datetime1 and doesn’t finish executing by datetime2, the job won’t fire at datetime2. After datetime2, the job will be called again at the next scheduled date and time. So, a key problem with job scheduling is not being able to find dates and timestamps of missed jobs (i.e., scheduled jobs that were skipped). SQL Server doesn’t report this information; nonetheless, it’s crucial data for developers and DBAs to know to ensure that database programs indeed run on time.

Another important problem is conflicting schedules. If one SQL Server job has two or more schedules, it’s possible that different schedules reference the same (or approximately the same) dates and times. When this happens, a job will be called multiple times within a couple seconds, which could result in logical and functional execution errors. For example, assume that the first step of a job creates a table called msdb.dbo.utbJobExecution. If the first step is invoked twice simultaneously, one of the two running instances will result in an error because the table already exists in the database. Consequently, users must be able to tell whether scheduling overlaps exist and prevent such conflicts.

How SQL Server Stores Job Information

The properties and attributes of all SQL Server jobs on a server are stored in the msdb database. The dbo .sysjobs table is available in both SQL Server 2005 and SQL Server 2000 (although the table schema has changed in 2005) and contains one row per job. The sysjobs table holds the following information: job name, a UNIQUEIDENTIFIER column called job_id, which stores the ID of each job; an owner_sid column of type varbinary(85), which captures the ID of the login owning the job (as it appears in the master.dbo.syslogins table); the enabled bit flag column, which indicates whether a job is enabled or disabled; and the date_created column, which holds the job-creation date.

SQL Server supports many scheduling options. In SQL Server 2000, scheduling data is stored in the dbo .sysjobschedules table. Job schedules in SQL Server 2005 are split between two tables, dbo.sysjobschedules and dbo.sysschedules. The dates when jobs are invoked (denoted by run dates) are determined by the freq_type column (in sysschedules in SQL 2005 and sysjobschedules in SQL Server 2000). You can configure jobs to run once at a given date and time, every few days, every several weeks on certain days of the week, or monthly. Table 1 summarizes these options. In the span of each run-date day, the runtimes determine the timestamps for invoked jobs. A job can either fire once per day, per schedule at a user-provided time or run repeatedly every few minutes or hours, as Web Table 1 shows.

The Problem… and Its Solution

To find missed jobs or conflicting schedules within a given time period, you must first be able to tell when the jobs are supposed to run, which isn’t an easy task. After obtaining the scheduled run dates and runtimes, it’s a simple matter to use the job-execution history stored in the dbo.sysjobhistory table to find scheduling conflicts or correlate the calculated dates and times, then find all the missed jobs.

To locate missed jobs, you can use the JobScript_ FindMissedJobs.sql script. JobScript_FindMissedJobs .sql accepts two parameters: a start date time and an end date time parameter. When this script is executed, it finds all dates and times in which jobs were supposed to run in the given time window, then compares this information with the job-execution history. If any jobs didn’t run on time, the script records and plots all the missed jobs. To identify conflicting schedules, you can run the JobScript_FindConflictingJobSchedules.sql script, which also accepts start date-time and end datetime parameters and reports all conflicting schedules in the provided time range.

These scripts let users enter the start date time and end date time parameters for the time window to be examined as the variables @StartDateTimeToCalculate and @EndDateTimeToCalculate, respectively, as you can see at callout A in the sample script in Listing 1. (The code in Listing 1 is an example containing lines of code from both the JobScript_FindMissedJobs .sql and JobScript_FindConflictingJobSchedules.sql scripts; it’s meant to illustrate the scripting techniques I discuss in the article.) Then, the scripts take the following actions to extract the run dates and runtimes for all SQL Server jobs. First, a snapshot of all job and scheduling information is saved into a temporary table called #JobScheduleInfo. Using the user-provided time range, each script calculates an effective time window for every SQL Server job. The script then loops through all jobs and schedules to determine the run dates in the effective time window, as I explain in more detail shortly. The scripts store the calculated dates in a temporary table named #JobScheduledDates. Next, for each run date, the script finds the runtimes and stores all dates and times in the temporary table #JobScheduledDatesAndTimes. The scripts then obtain the missed jobs and conflicting schedules data according to the information in the #JobScheduled- DatesAndTimes table and return this information to the user.

Narrowing the Time Window

Let’s take a closer look at how the scripts work. Assume that the user enters 1/1/1900 for the start date and 1/1/2500 for the end date. Clearly, there’s no need to reconstruct the job runtimes for the entire time range because a job couldn’t have run before SQL Server was conceived or in the future. To optimize performance, the scripts first calculate an effective time window that looks only at relevant dates for each job. When the job is initiated, the job’s creator enters the active_start_date and active_start_time information columns (in the sysschedules table in SQL Server 2005 and the sysjobschedules table in SQL Server 2000). A job clearly cannot run before the date and time specified in these columns. Moreover, the date_created column in the sysjobs table stores the datetime value of the job-creation date, and it’s obvious that jobs can’t run before they were created.

For the end date, the active_end_ date and active_end_time columns (in sysschedules and sysjobschedules and in SQL Server 2005 and SQL Server 2000, respectively) store the expiration time after which jobs cease to run. The scripts take these facts into consideration and calculate the effective minimum and maximum datetimes in the columns min_datetime_to_ consider and max_datetime_to_consider in the #JobScheduleInfo table. The effective time window is therefore the gap between these two datetime columns, as callout B in Listing 1 shows.

Calculating Run Dates and Times

After the scripts calculate the effective time window, they’ll proceed to calculate the run dates for all jobs on the server. Jobs scheduled to run once will fire one time at the date and time specified in the active_start_date and active_start_time columns. For all other scheduled times (i.e., daily, weekly, or monthly jobs), finding the run dates is more complicated.

To demonstrate this fact, consider the following example. A weekly job is created on March 1, 2007 (Thursday), at 3:00 p.m. and is scheduled to run every three weeks on Sunday and Thursday, once a day at 11:00 a.m. After you’ve created the job, several questions immediately come to mind. Since the job was created at 3:00 p.m. and is supposed to run at 11:00 a.m., will SQL Server Agent consider March 1 as a run day (in which case the next run date is three weeks from today), or will it fire the job next Thursday? Will the Agent pick the next Sunday as a run day, or does it consider the last Sunday as a run day? Fortunately, there are relatively easy ways to reconstruct the run dates.

The sysjobschedules table has two columns that contain the next date and time when a job will run: next_run_date and next_run_time. These columns are updated (for all jobs) when SQL Server Agent is started, or periodically every 20 minutes, as explained in SQL Server 2005 Books Online. In other words, the values stored in next_run_date and next_run_time can be stale or might not be available at all (e.g., from the time that a job is created until SQL Server Agent updates the next-run values for the new job). However, for purposes of finding the job run dates, these issues are irrelevant; the next_run_date (even if stale) gives you a date when you know that the job is supposed to run. When the job_run_date isn’t available, it means that the job was created in the last few minutes, in which case you aren’t concerned about missed schedules.

When the scripts are run, the next_run_date is recorded from sysjobschedules for each job and schedule. To get the most up-to-date values for the next_run_date, the scripts also call the (undocumented) extended stored procedure master..xp_sqlagent_enum_ jobs, which queries the SQL Server Agent to return the latest data. If the returned next-run values are later than those in sysjobschedules, the script overwrites the previously recorded values.

Once the scripts know that a job is scheduled to run at a certain date, they can extrapolate all the run dates for each schedule. Starting at the date in next_run_date, the script adds the appropriate frequency interval (i.e., days, weeks, months) and records all run dates in the #JobScheduledDates table, until the datetime value in max_datetime_to_consider is reached. Next, the script returns and determines all run dates between next_run_date and min_datetime_to_consider.

After obtaining the run dates, you need to calculate the times when jobs run in each run date. To do so, you start with the active_start_time: If a job is scheduled to run once a day, the active_start_time is the runtime. If jobs run every several (say x) minutes or hours, you add x minutes (or hours) to the active_start_time until the end of the day is reached.

Finding Missed Jobs and Conflicting Schedules

The run dates and times are stored in the #JobScheduledDatesAndTimes table. To find missed jobs, the JobScript_FindMissedJobs.sql script compares the content of this table with the values in sysjobhistory to check whether the first step was invoked on (or a few seconds after) the scheduled date and time. You can now easily obtain conflicting schedules by comparing the dates and times in #JobScheduledDatesAndTimes. If two run dates and times are only several seconds apart (the number of seconds is configurable and is stored in the variable @TimeGapInSeconds), then the schedules indeed conflict.

Special Situations

Obtaining the run dates can still be challenging for some schedules, and the scripts address such cases. One such example is when, every few months, a job is scheduled to run on the third or last weekend day. The code to determine whether a day is a weekend or a business day depends on the @@DATEFIRST settings; Listing 2 shows an example of this code. The code in Web Listing 1 returns the nth day of the month (where n is the freq_relative_interval value in Table 1 that corresponds to jobs with freq_type = 32 and the freq_interval is either 9 or 10).

Quick Job Tracking

You can also query SQL Server Agent scheduling information by using Windows Management Instrumentation (WMI) or another scripting language besides T-SQL. The main advantage of my scripts, though, is that they can return missed jobs and conflicting schedules quickly, using simple T-SQL code, even when many SQL Server jobs are on the server. However, the scripts have a couple potential limitations. If you run the scripts against a large time window (e.g., several years), they might take a while to run. Additionally, the scripts don’t consider scheduling changes. If a schedule changes at some point in time, the information reported for the dates prior to the change might be incorrect. If you keep in mind these minor caveats, the scripts should help you improve your ability to keep a closer watch on SQL Server job scheduling.

Comments

Plain text