Until now, my previous two posts have largely just been a review of best practices for making backups redundant – with only a single mention about RESTORE VERIFYONLY’s LOADHISTORY argument as a means of providing a rationale for enabling luke-warm failover servers from those backups.
And, again, just for the record: there ARE better High Availability solutions out there for SQL Server (such as Mirroring, Log Shipping, SQL Server 2012’s AlwaysOn happiness, clustering, and --in some cases -- replication). As such, the point of these posts is really just to cover options for less-expensive failover options for organizations that don’t need ‘full-blown’ failover options and as a means of describing some additional ways to make ‘full-blown’ HA solutions more redundant and capable of recovery – as you never know when you’ll need some sort of Remote-Availability or smoke-and-rubble contingency. That, as I’ve said over and over again, high-availability isn’t the same as disaster recovery.
So, with all of that said, the big problem at this point is that just having copies of your backups in secondary/remote locations is NOT a disaster recovery plan. Yes, having copies of your backups off-site is better than nothing – but merely having copies of your backups off-site isn’t going to help much in an emergency – especially if we’re talking about multiple databases of any decent size. That, and unless you’re regularly testing these secondary/luke-warm backups, you not only have any idea whether they’re viable or not, but you don’t accurately know if they’re capable of meeting RPOs and RTOs.
To that end, you need what I like to call a ‘mass recovery’ script.
The Benefits of a Mass Recovery Script
In my mind there are two primary benefits of a 'mass recovery' script. First and foremost, my experience is that most DBAs aren’t as familiar with all of the ins and outs of restoring databases in an emergency as they need to be. Or, as I like to tell many of my clients: “The LAST thing you want to be doing in a disaster scenario (with your boss standing over your shoulder) is be figuring out how to properly restore a number of databases while management is worried about when people, systems, and expensive activities can get back online.”
Second, even if you’re a ninja at restoring databases from scratch (where you’ll have to use T-SQL against raw files – unless you’ve done something with RESTORE VERIFYONLY’s LOADHISTORY functionality), you’ll still typically need to wire up the restoration of a decent number of log-file backups – which can quickly become tedious and subject to error. Especially if you have multiple databases that go down. (For example, assume that you’re making FULL backups every Sunday Night, and then DIFFERENTIAL backups every week-night at 3AM, and your database goes down at 10AM. If you’ve got any type of ‘real-world’ RPOs, you’re likely going to need log-file backups every 15 minutes (or less) in order to ensure copies of operations are successfully duplicated. That, in turn, means that you’ll be looking at needing to apply roughly 7 hours worth of log file backups or ~28 log file backups per database in an emergency scenario.)
So, in terms of this second point, I like to point out that manually restoring databases, one file at a time from the GUI is simply an option that does not scale. So, unless you’ve gone ahead and imported backup file meta-data into your luke-warm standby server (using the LOADHISTORY argument of RESTORE VERIFYONLY) and can accurately 'COUNT on that data being up-to-date, you’re going to be wrestling with lots of files in most luke-warm disaster recovery scenarios. As such, the use of a mass-recovery script can be used to very easily automate the process of bringing numerous databases (or even a single database) back online in a disaster scenario.
Better yet, once you’ve got a mass recovery script up and running, you can drop in on your luke-warm failover servers periodically (say, once a week – depending upon the sensitivity and importance of your data) and run this script to test it against Recovery Time Objectives and to likewise test it against RPOs as well.
Creating a Mass Recovery Script
When it comes to building a mass-recovery script, there are sadly a large number of factors and concerns to pay attention to – to the point where there really isn’t going to be a ‘one size fits all’ approach to executing luke-warm failovers that will work everywhere and for every situation. For example, I keep referring to this encapsulation of logic needed to restore databases as a 'mass' recovery script – meaning that, in my experience, it’s pretty typically to need to bring multiple databases online – instead of just a single database. But, in some environments, there might only be a single database. In other environments there might be multiple, different, databases, while in other environments there might be multiple, similar, or even related databases that either have different business priorities or which might infer some sort of hierarchical set of dependencies. In other words, there’s simple a large amount of complexity that has to be adequately addressed when it comes to restoring off-box or luke-warm backups as part of a disaster recovery contingency plan.
To that end, I’ve created two lists – of obvious and not-so-obvious things you’ll want to account for when creating recovery scripts for luke-warm backup servers.
Obvious Concerns For Luke-Warm Backup Scripts
- Paths. In an ideal world you’d have identical drive and folder/pathing configurations on lukewarm backup servers as you would on primary production hosts. Only, that ends up typically being cost-prohibitive in most cases – and difficult to keep in sync. Consequently, whatever scripts your using to recover databases on a lukewarm host are typically going to need to account for potential different locations for data and log files.
- FULL vs DIFF vs Log File Backups. Another obvious thing that your mass-recovery script will have to deal with is the fact that all ‘backups’ are not created equal – meaning that any script you’ll create to quickly ‘whip through’ the process of restoring databases from backups will have to ‘know’ how to work with FULL, DIFFERENTIAL, and LOG FILE backups as needed in order to restore the latest version of a database from backup files on hand. And, of course, this means that whatever script you’ll end up creating needs to be 100% coupled to media choices, pathing, and naming conventions used in the creation of backups.
- Third Party Backups. It should also go without saying that if you’re using third-party backups (for compression and/or encryption), you’ll need to account for these variables as well when creating a mass-recovery script. That said, in most cases, third party backup solutions are GOBs easier to use against ‘piles’ of backup files than SQL Server Management Studios crappy Recovery GUI that ONLY allows you to select a single .trn file at time (for example) – to the point where if you only have a few databases, you MAY actually find that using the GUI that ships with your third-party backup solution is good enough for recovery purposes. Of course, you won’t know this without testing. That, and while SQL Server’s licensing allows you to deploy full-blown instances of SQL Server at no cost for DR purposes, many third party vendors do NOT allow the same kind of deployment options – meaning that you may have to pay for full or partial licenses of third-party backup software on luke-warm backup servers – so make sure to check into these details as needed.
Not-So-Obvious Concerns for Luke-Warm Backup Scripts
- Permissions. While you as a DBA may be able to see that all of your SQL Server backups are comfortably nestled in, say, your F:\SQLBackups\ folder, that doesn’t mean that SQL Server will be able to see those backups on a luke-warm failover server. (On the primary server where the backups are made, SQL Server will have rights to see and interact with the files it creates – but that will NOT be the same case on a failover/backup server where these files have merely been copied.) Consequently, you’ll need to make sure that the Service Account that SQL Server is running under for your luke-warm failover server has access to the copied backups that have been moved on-box from an off-box source.
- Logins. Recovering data after a disaster is your primary concern. But even if you’ve been able to recover all data within specified RPOs, that doesn’t do any good if end-users and applications can’t reach that data. Accordingly, you’ll need to make sure that you’ve got some sort of mechanism in place that allows applications to point at a new host. Similarly, you’ll need to make sure that you’ve properly defined, and mapped, logins on the failover server so that end-users and applications can log in and access data correctly. For more information on these concepts, I recommend that you watch Copying and Moving SQL Server Logins – where I describe many of the nuances associated with these needs (along with providing links to scripts you can use to script/migrate logins and to correct issues with ‘orphaned users’).
- The Tail End of the Log. In a typical, on-box, disaster recovery scenario, the first operation in any DR plan should be to back-up the tail end of the transaction log (or the active portion of the transaction log that hasn’t been backed up yet) – as a means of protecting transactions that have already completed but which haven’t been backed up yet. Only, when it comes to off-box/redundant backups of the type I’ve described in this and my two previous posts, it’s going to go without saying that you’ll almost never have access to this valuable resource. If, however, you’re setting up some sort of secondary/off-box contingency and think you may have access to the tail-end of the log, you’ll want to account for that as needed – as any opportunity to leverage the tail-end of the log is commonly going to be worth its weight in gold in terms of the amount of work it will prevent in terms of ‘re-entering’ transactions and operations that would otherwise be lost.
- Errors. Nobody likes dealing with errors – and so in a perfect world there wouldn’t be. But we don’t live in a perfect world, and (frankly) since we’re already talking about mechanisms for dealing with the ugliness of system-wide failures that have forced you on to secondary/luke-warm hosts for potentially mission-critical data, it goes without saying that errors are a fact of life. More importantly, the last thing you want to have happen in a set of recovery scripts is to have an error go un-noticed, un-attended, or un-addressed – especially if that means you end up losing precious time. Accordingly, good mass recovery scripts account for the fact that errors can and will occur, and then attempt to minimize the potential impact of those errors. (In my experience, the occurrence of errors can be mitigated through regular testing but that won’t completely eliminate the possibility of errors cropping up. As such, I typically take the approach of just ensuring that those errors are called out and NOT compounded by attempting to apply more backups or operations against databases that have encountered a recovery error. In this way I don’t waste precious time by having automation routines continue to work with something that can no longer be automated, and I also help avoid the mental stress and overhead that comes when you’re bombarded with gobs of errors being thrown as opposed to a single error being encountered and called out.)
Example Mass Recovery Script
As a more ‘hands-on’ approach to outlining some of the specifics that go into creating and using a mass-recovery script, I’m including a very ROUGH copy of a script I put together during the holidays. It’s similar to some scripts I’ve written for clients – but focuses on the use of ‘native’ SQL Server backups (instead of being designed to work against large numbers of databases that are backed up with third-party backup solutions).
/* ---------------------------------------------------------------------------------------- INSTRUCTIONS A) Don't run this in production. Run it on a luke-warm backup server. B) Check out the Notes/Warnings/Section Below. This script WORKS, but it's very limited and should NEVER be used without testing and validation prior to use with production/emergency data. C) PRESS CTRL+T to kick output into 'text' mode. It runs fine in grid mode (CTRL+D), but is optimized for execution in text-mode. D) Set necessary parameters down in the PARAMETERS section. Make sure that you SQL Server Service account has necessary perms against the folders specified. E) To test it out, try running it in @debug = 1, @executeCommandsFirst to see if the syntax/pathing/etc all look correct. F) Once you've validated that this basic script works and can perform a 'mass' recovery operation against your data, revisit the NOTES/WARNINGS/CONSIDERATIONS section and FINISH this script for your environment. In other words, this script is a 'skeleton' designed to give you an idea of KEY operations - not as a complete DR / Failover solution. ---------------------------------------------------------------------------------------- NOTES/WARNINGS/CONSIDERATIONS: - DO NOT RUN THIS ON YOUR PRODUCTION SERVER. IT WILL OVERWRITE YOUR DATABASES. - This Script REQUIRES xp_cmdshell. If you don't know how to turn that on, read up on the SECURITY implications of turning that on in your environment - they're HUGE. - This script ASSUMES that FULL, DIFF, and LOG backups contain FULL, DIFF, and LOG in the respective file-names (somewhere). - This script works - but should NEVER be used in your environment against production data that you care about. - This script is ONLY designed as a rough outline of KEY operations to tackle during a mass-recovery operation on a LUKE-WARM standby server (or a server where SQL SErver is already installed but where no DBs are running). - This Script WILL replace DBs if they already exist on the target. If they don't exist on the target host, it will re-map their files to the directories specified. - HOWEVER, this script ONLY accounts for a primary .mdf and a single .ldf - if you need more than this, you'll need to address that yourself. - This script does NOT account for users/apps being logged into dbs that are being restored. - This script does NOT account for data at the tail-end of your log on a PRODUCTION server. It's designed for a smoke and rubble/failover/remote-available contingency. - This script does NOT account for permissions/logins. If you need help with those see the following video: http://www.sqlservervideos.com/video/copying-and-moving-sql-server-logins/ - Error Handling in this script is ONLY stubbed in and has NOT been tested. ---------------------------------------------------------------------------------------- PARAMETERS: @debug - specifies whether or not the commands to be executed will be 'echoed' out to the 'messages' tab during execution of this routine. If @debug is set to 1 and the @executeCommmands is set to 0, you'll get a verbose output/summary of all commands that WOULD have been executed had this script been run normally - but NONE of the actual commands will be run or executed. @executeCommmands - specifies whether or not the actual RESTORE and other commands should actually be run or executed. @dbsToRestore - A comma-delimited list of databases to be restored. Datbases are restored in the order specified in this list. So, databases that should be restored sooner than others should be listed first/prioritized as needed. @backupsRoot - This is the path to the base backup folder for the server in question. This script EXPECTS that the base/root folder for backups will have a sub-folder in it for every database you wish/need to restore - and that each subfolder will contain FULL, [DIFFERENTIAL], and LOG file backups as needed. @dbsRoot - This is the root path to the location where end-user databases are stored. @recoverDatabases - Specifies whether or not databases should be recovered. In most disaster recovery scenarios, this should be true/1 - unless there's a need to add more transaction log backups for some reason. However, if trying to set up log shipping or something else, then set this to 0 - so that other log files can be added/etc. */ ---------------------------------------------------------------------------------- ---------------------------------------------------------------------------------- -- PARAMETERS: -- Set the following values BEFORE commencing execution. -- And don't forget to press CTRL+T to kick query/execution output into text mode. ---------------------------------------------------------------------------------- DECLARE @debug bit = 0 -- echo/output commands to execute: DECLARE @executeCommmands bit = 1 -- execute actual SQL DECLARE @recoverDatabases bit = 1 -- recovery dbs... or leave them capable of adding more log files/etc. -- Define which databases (in order of precedence/priority) to restore: DECLARE @dbsToRestore nvarchar(2000) = N'aapiDenver,aapiCHLA,aapiDev' DECLARE @backupsRoot sysname = N'D:\SQLBackups\' DECLARE @dbsRoot sysname = N'D:\SQLData\' SET NOCOUNT ON ---------------------------------------------------------------------------------- -- END PARAMETERS ---------------------------------------------------------------------------------- ---------------------------------------------------------------------------------- -- Begin Script Logic/Functionality SET @dbsToRestore = REPLACE(@dbsToRestore,' ','') DECLARE @targetDatabase sysname DECLARE @Errors TABLE ( dbname varchar(30), phase varchar(20), ErrorCommand nvarchar(max) ) -- Start Processing: begin by turning @dbsToRestore into a table: DECLARE @dbs TABLE ( id int IDENTITY(1,1) NOT NULL, dbname sysname NULL ) DECLARE @index int = 1 DECLARE @chunk nvarchar(30) IF LEN(@dbsToRestore) < 1 OR @dbsToRestore IS NULL RAISERROR('No Databases Defined For Restoration',18,1) WITH NOWAIT -- Fatal WHILE @index != 0 BEGIN SET @index = CHARINDEX(',',@dbsToRestore) IF @index != 0 SET @chunk = LEFT(@dbsToRestore, @index - 1) ELSE SET @chunk = @dbsToRestore IF(LEN(@chunk) > 0) INSERT INTO @dbs (dbname) VALUES (@chunk) SET @dbsToRestore = RIGHT(@dbsToRestore, LEN(@dbsToRestore) - @index) IF LEN(@dbsToRestore) = 0 BREAK END -- Initialize by restoring Full backups: DECLARE targets CURSOR FAST_FORWARD FOR SELECT dbname FROM @dbs OPEN targets FETCH NEXT FROM targets INTO @targetDatabase WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @input TABLE ( [output] varchar(500) ) DECLARE @backupFiles TABLE ( [id] int IDENTITY(1,1), [output] varchar(500) ) DECLARE @cmd nvarchar(200) = N'dir "' + @backupsRoot + @targetDatabase + '" /B /A-D /OD' IF @debug = 1 PRINT '-- ' + @cmd DELETE FROM @input INSERT INTO @input EXEC master..xp_cmdshell @cmd -- now that we've got raw input, we need to push it into the files table DELETE FROM @backupFiles INSERT INTO @backupFiles SELECT [output] FROM @input DELETE FROM @backupFiles WHERE id < ( SELECT MAX(id) FROM @backupFiles WHERE [output] LIKE '%FULL%' ) OR [output] IS NULL IF @debug = 1 SELECT * FROM @backupFiles DECLARE @fullRestore nvarchar(800) DECLARE @fullBackup sysname SELECT @fullBackup = [output] FROM @backupFiles WHERE [output] LIKE '%FULL%' IF EXISTS(SELECT NULL FROM master.sys.databases WHERE name = @targetDatabase) BEGIN SET @fullRestore = N'RESTORE DATABASE ' + QUOTENAME(@targetDatabase, '') + ' FROM DISK = N''' + @backupsRoot + @targetDatabase + '\' + @fullBackup + ''' WITH REPLACE, NORECOVERY' END ELSE BEGIN -- Map file locations: DECLARE @FileList TABLE ( LogicalName nvarchar(128), PhysicalName nvarchar(128), [Type] char(1), FileGroupName nvarchar(128), Size numeric(20,0), MaxSize numeric(20,0), FileId bigint, CreateLSN numeric(25,0), DropLSN numeric(25,0), UniqueId uniqueidentifier NULL, ReadOnlyLSN numeric(25,0), ReadWriteLSN numeric(25,0), BackupSizeInBytes bigint, SourceBlockSize int, FileGroupId int, LogGroupGUID uniqueidentifier NULL, DifferentialBaseLSN numeric(25,0), DifferentialBaseGUID uniqueidentifier NULL, IsReadOnly bit, IsPresent bit, TDEThumbprint varbinary(32) ) DECLARE @fileListCommand nvarchar(800) SET @fileListCommand = N'RESTORE FILELISTONLY FROM DISK = N''' + @backupsRoot + @targetDatabase + '\' + @fullBackup + ''' ' IF @debug = 1 PRINT '-- ' + @fileListCommand INSERT INTO @FileList EXEC sp_executesql @fileListCommand DECLARE @datafile sysname, @logfile sysname SELECT @datafile = LogicalName FROM @FileList WHERE FileId = 1 SELECT @logfile = LogicalName FROM @FileList WHERE FileId = 2 SET @fullRestore = N'RESTORE DATABASE ' + QUOTENAME(@targetDatabase, '') + ' FROM DISK =N''' + @backupsRoot + @targetDatabase + '\' + @fullBackup + ''' WITH MOVE ''' + @datafile + ''' TO ''' + @dbsRoot + @targetDatabase + '_data.mdf'', MOVE ''' + @logfile + ''' TO ''' + @dbsRoot + @targetDatabase + '_log.ldf'', NORECOVERY ' END IF @debug = 1 PRINT @fullRestore IF @executeCommmands = 1 BEGIN EXEC sp_executesql @fullRestore -- handle errors: IF @@ERROR <> 0 BEGIN RAISERROR('Error Encountered.',5,0) WITH NOWAIT INSERT INTO @Errors SELECT @targetDatabase, 'Restoring FULL Backup', @fullRestore END END -- Look for DIFFerential backups: IF EXISTS (SELECT NULL FROM @backupFiles WHERE [output] LIKE '%DIFF%') BEGIN -- Delete everything < last DIFF backup: DELETE FROM @backupFiles WHERE id < ( SELECT MAX(id) FROM @backupFiles WHERE [output] LIKE '%DIFF%' ) -- restore the diff backup: DECLARE @diffBackup sysname DECLARE @diffRestore nvarchar(800) SELECT @diffBackup = [output] FROM @backupFiles WHERE [output] LIKE '%DIFF%' SET @diffRestore = N'RESTORE DATABASE ' + QUOTENAME(@targetDatabase, '') + ' FROM DISK = N''' + @backupsRoot + @targetDatabase + '\' + @diffBackup + ''' WITH NORECOVERY ' IF @debug = 1 PRINT @diffRestore IF @executeCommmands = 1 BEGIN EXEC sp_executesql @diffRestore -- handle errors: IF @@ERROR <> 0 BEGIN RAISERROR('Error Encountered.',5,0) WITH NOWAIT INSERT INTO @Errors SELECT @targetDatabase, 'Restoring DIFF Backup', @diffRestore END END END -- Start applying Log File Backups: DECLARE logFiles CURSOR FAST_FORWARD FOR SELECT [output] FROM @backupFiles WHERE [output] NOT LIKE '%FULL%' AND [output] NOT LIKE '%DIFF%' -- don't grab the FULL backup or the DIFF backup if there was one. ORDER BY id ASC DECLARE @logFilebackupName sysname DECLARE @command nvarchar(800) OPEN logFiles FETCH NEXT FROM logFiles INTO @logFilebackupName WHILE @@FETCH_STATUS = 0 BEGIN SET @command = N'RESTORE LOG ' + QUOTENAME(@targetDatabase, '') + ' FROM DISK = N''' + @backupsRoot + @targetDatabase + '\' + @logFilebackupName + ''' WITH NORECOVERY ' IF @debug = 1 PRINT @command IF @executeCommmands = 1 BEGIN -- don't continue adding logs if we've bumped into errors: IF NOT EXISTS(SELECT NULL FROM @Errors WHERE dbname = @targetDatabase) BEGIN EXEC sp_executesql @command -- handle errors: IF @@ERROR <> 0 BEGIN RAISERROR('Error Encountered.',5,0) WITH NOWAIT INSERT INTO @Errors SELECT @targetDatabase, 'Applying LOG Backup', @command END END END FETCH NEXT FROM logFiles INTO @logFilebackupName END CLOSE logFiles DEALLOCATE logFiles -- when we're done, recover if directed: IF @recoverDatabases = 1 BEGIN DECLARE @recovery sysname = N'RESTORE DATABASE ' + QUOTENAME(@targetDatabase,'') + ' WITH RECOVERY' IF @debug = 1 PRINT @recovery IF @executeCommmands = 1 BEGIN -- don't recover the database if we've bumped into errors: IF NOT EXISTS(SELECT NULL FROM @Errors WHERE dbname = @targetDatabase) EXECUTE (@recovery) ELSE RAISERROR('NOT recovering %s due to errors.',1,0, @targetDatabase) WITH NOWAIT END END -- load up another db, and go around the horn again... FETCH NEXT FROM targets INTO @targetDatabase END CLOSE targets DEALLOCATE targets -- Report on Errors: IF @executeCommmands = 1 BEGIN IF EXISTS(SELECT NULL FROM @Errors) BEGIN RAISERROR('Outputting Error Information',1,0) WITH NOWAIT SELECT * FROM @Errors END END GO
The script itself is fairly basic and simple. By specifying base paths to folders where backups for multiple databases can be stored (assuming, of course, that database backups are all stored in their own folders), the script takes advantage of xp_cmdshell (an extended procedure that hackers can wreak the ultimate havoc with on your box if they’re able to take control via SQL Injection) to query the OS and get a list of all backup files in a given directory. It then finds the latest FULL backup, restores it, looks for the latest/last DIFFERENTIAL backup if there is one (and applies it if needed), and then finds all Transaction Log backups since the last FULL/DIFFERENTIAL backup and applies them one after the other until they’re all gone – at which point it then attempts to RECOVER the database.
All in all, the script is VERY rudimentary, and just ‘automates’ the process of restoring files/media as needed. It also comes with a huge number of caveats (in the NOTES/WARNINGS/CONSIDERATIONS section) – as the script is NOT meant to go into your production environment as a full-blown solution. Instead, it’s best thought of as a bit of scaffolding or an outline of key concepts and concerns that you’ll want to address. But, by taking a script like this and making it your own, you CAN come up with some great solutions that will allow for excellent recovery options to luke-warm standby servers. I’ve actually got solutions similar to this one working for a couple of clients where the use of these kinds of ‘mass recovery’ scripts allow me (or my clients) to periodically go in and verify that RPOs and RTOs can be met in the WORST kind of disaster – such as when existing HA functionality fails, or when an entire data-center is lost. So, in that regard I have full confidence in the overall approach being outline here. But in order for something like this to work for you, I can’t stress enough that you’ll need to take what I’ve provided here as a STARTING POINT from which you’ll need to throw in your own, additional, effort and testing to get it to be a viable solution for your needs.