There’s simply no way to overstate the importance of regular Transaction Log Backups. Not only do they help protect from disaster, but regular execution of T-Log backups on Full (and Bulk-Logged) Recovery databases helps keep thing “fit and trim”. Most of the time, setting up a Notification for when T-Log Backup Jobs fail is enough to let you know when something goes wrong.
However, I have seen a couple of edge cases where this simply wasn’t enough. Typically this has been in multi-tenant systems and/or in systems with a high volume of transactions where any of the following things can or might have happened:
- The UNC path or destination that the MIRROR TO clause (an Enterprise Edition only feature that lets you save backups to two locations (i.e., locally and to an off-box secondary location)) is pointing to is having issues and has slowed things down enough that while backups haven’t exactly failed, they’re no longer running on a regular schedule and it might therefore have been an hour or more since some databases have been adequately backed up.
- Some third party SQL Server Backup Solutions have a GREAT ‘retry’ feature that lets you specify that if a backup fails for some reason, it should be retried N times with X amount of seconds to wait between each try. This is a great feature – unless you’ve run into problems, have too long of a wait between retries, and have GOBs of retries going on against multiple databases – at which point your backup job won’t actually ‘fail’ but can end up being seemingly ‘stalled’ for hours at a time. (e.g., if I’ve got a job set up to backup T-Logs against 20 databases every 10 minutes, and there’s some sort of problem executing the backups ‘retry’ attempts against a single database might end up taking 20 minutes – then the job will move on to the next database and so on – until you’ve spent hours with retry logic slowly sabotaging your backup chain before it’ll ever raise an error or alert)
In cases like this, and where it’s even more essential that T-Log backups are happening regularly, I’ve created a simple script that can be used to checkup on both the job that runs your backups (to make sure it has successfully completed within the last N minutes) and to checkup on the last backup dates of your FULL recovery databases.
The one obvious limitation with this job is that it runs via the SQL Server Agent – which means that if the SQL Server Agent (which is handling your backups) somehow crashes, then this ‘watchdog’ won’t be able to run either. (That said, in years of working with SQL Server I’ve never seen the SQL Server Agent seriously crash or unable to run – well, except when some enterprising soul explicitly shut it down. So if any of this is a concern, you might want to try and run this script via some OTHER scheduling mechanism than simply dropping it into a new SQL Server Agent Job and setting up a regular schedule for it to run under.)
Otherwise, the script is pretty simple. Just specify how far back you’d like your checks to run, and the name of the Job that’s handling your T-Log backups. (If you’ve got multiple jobs you could easily tweak this to account for an IN() clause.) Then, specify the name of an operator and the Database Mail Profile to use when sending alerts and this script will do the rest - once you schedule it for regular execution.
SET NOCOUNT ON; -- Variables: DECLARE @NumberOfPastMinutesToCheck int = 30; DECLARE @LogBackupJobName sysname = N'Customer Databases - TLOG'; -- Alerting details: DECLARE @OperatorName sysname = 'Alerts'; DECLARE @DbMailProfileName sysname = 'General'; ------------------------------------------------------------------------------ -- Logic/Implementation: DECLARE @JobID uniqueidentifier; SELECT @JobID = job_id FROM msdb..sysjobs WHERE name = @LogBackupJobName; DECLARE @Subject nvarchar(200); DECLARE @Body nvarchar(200); -- Start with review of Backup Job - check to make sure it has -- run (successfully) within the last @NumberOfPastMinutesToCheck DECLARE @ago datetime = DATEADD(n, - @NumberOfPastMinutesToCheck, GETDATE()); DECLARE @DateAsInt int; DECLARE @TimeAsInt int; SELECT @DateAsInt = CAST(CONVERT(varchar(20), @ago, 112) AS int); SELECT @TimeAsInt = CAST(REPLACE(CONVERT(varchar(20), @ago, 108),':','') AS int); DECLARE @states TABLE ( run_date int, run_time int, run_status int ); INSERT INTO @states SELECT run_date, run_time, run_status FROM msdb..sysjobhistory WHERE run_date >= @DateAsInt AND run_time >= @TimeAsInt AND job_id = @JobId AND step_id = 0 -- job outcome (only) AND run_status = 1; -- succeeded (only) DECLARE @successCounts int; SELECT @successCounts = COUNT(*) FROM @states; IF @successCounts < 1 BEGIN SET @Subject = '[SQL Server Backups] Transaction Log Backup Failure' SET @Body = 'The Transaction Backup Job (with ID ' + CAST(@JobId AS varchar(36)) + ' has not successfully completed in the last ' + CAST(@NumberOfPastMinutesToCheck AS varchar(20)) + ' minutes.'; EXEC msdb..sp_notify_operator @profile_name = @DbMailProfileName, @name = @OperatorName, @subject = @Subject, @body = @Body; END ------------------------------------------------------------------ -- Now check to see if any databases haven't -- been backed up in last @NumberOfPastMinutesToCheck: DECLARE @databases TABLE ( [name] sysname ); INSERT INTO @databases SELECT name FROM master.sys.databases WHERE recovery_model_desc = 'FULL' AND name NOT IN ('model') AND name NOT IN (SELECT DISTINCT database_name FROM msdb.dbo.backupset WHERE [type] = 'L' AND backup_finish_date > @ago); DECLARE @failedDbs nvarchar(2000) = ''; SELECT @failedDbs = @failedDbs + name + ',' FROM @databases ORDER BY name; IF @failedDbs != '' BEGIN SET @Subject = '[SQL Server Backups] Transaction Log Backup Failure'; SET @Body = 'The following databases have not been backed up within the last ' + CAST(@NumberOfPastMinutesToCheck AS varchar(20)) + ' minutes: ' + CHAR(13) + CHAR(10) + CHAR(9) + @failedDbs + '.'; EXEC msdb..sp_notify_operator @profile_name = @DbMailProfileName, @name = @OperatorName, @subject = @Subject, @body = @Body; END GO