Should I Be Using SAN Snapshots as a Backup Solution?

While at the SQL PASS Summit I was asked a question about using SAN Snapshots as a way to backup the databases on a SQL Server. The person in question was receiving a mandate from above that all SQL Server backups should be using the SAN Snapshot technology for all database backups and they wanted to know if this was a good idea or a bad idea, and what my thoughts on the topic were.

My response is pretty basic. As the DBA I want to control the database backups and I don't want to be using SAN Snapshots as a backup, because no matter what any vendor says a Snapshot is not a backup. Here are some of my reasons why I wouldn't want to use SAN snapshots instead of database backups.

1. The DBA can’t control the backup and restore process.

If there’s a problem and the database needs to be restored it’s the DBA that’s going to be thrown under the bus not the storage admin. If I'm going to be the one getting the blame, I'm going to be in control of the situation.

2. There’s no option for page level restore.

If one or two database pages get corrupt do I really want to restore the entire database to the last snapshot and loose all the changes since then? What is the page become corrupt a month ago and it wasn't found until now? Now we have to loose a months worth of data to restore the corrupt page? I want the ability to restore just that page using the native page level restore features which require having actual SQL Server backups.

3. There’s no point in time restore. You are limited to the times when the snapshot was taken. If I want to roll the database to a point between two snapshots that isn’t possible.

When it comes to point in time restore I need the ability to control to which exact point in time the restore happens. Telling me that it'll be restored to what ever point in time it was when the snapshot was taken isn't good enough a lot of the time. I need to be able to restore the a specific millisecond.

4. If the LUN which the database on it fails all the backups are lost (they are snapshots not clones). And the excuse of but that won’t happen isn’t a valid excuse. Anything that can fail, will.

When you are taking snapshots you are assuming that the LUN hosting the original data will still be there. If that LUN goes away for some reason (failed disk, human error, etc.) we've just lost the snapshots as well which means we have no backups.

5. The backups are now stored on the same device as the production data. If the device fails you’ve lost access to your backups until the device is restored.

See above.

6. The backups can’t be compressed.

Those snapshots are going to get large, fast. With native SQL Server backups I've got backup compression (assuming you are running the right version and edition of SQL Server) and I've got 3rd party tools which I can use to compress the backups with.

7. As the DBA I have no control as to how many backups are kept.

While it's awesome that the storage array can keep 500 backups, that means that we are responsible for making sure that 500 additional copies of our data aren't being lost, stolen, copied to another company, copied to another server, mounted to the wrong server, etc. One of the reasons that DBAs only want a small number of backups on site at any one time is so that we don't have to keep track of so many backups and who's touching them.

8. Taking a recoverable snapshot requires pausing the IO within the SQL Server every time the snapshot is taken which can lead to inconsistent performance for the end users.

In order to snap the databases to get these database snapshots we have to checkpoint the database and pause all IO while the snapshot is being taken. For any users who are writing to the database while this is happening they will see their sessions hang for up to 10 seconds while this is happening. They then complain to the DBA that the database is slow when in fact it's the snapshot which is causing the problem.

9. If there’s a problem with a backup there’s no way to know without attaching the backup to a SQL Server, usually after there’s been a major problem. With native SQL backups I can easily restore the backups to another server to test them rolling transaction logs forward as I see fit.

When we take backups we say that you don't have a good backup until that backup has been restored. This means that someone needs to take the backup, restore it to the SQL Server and verify that the database can be restored. With the native backups I can do this very easily and roll the logs forward as much as I'd like, all without any risk of performance impact to the production systems. When taking snapshots as backups we now have to attach every snapshot to test it as each backup is totally independent. This requires attaching the snapshot to another server and attaching the databases, which depending on how much data is in the transaction logs as active and needing to be rolled forward or backward could put a lot of stress on the production disks which are being shared with the snapshot (see number 4 above).

10. If I want to encrypt the backups, I don't have that option with SAN snapshots.

Given that the database backups will at some point be leaving the secure data center they need to be encrypted so that if the tapes are lost the database backup is useless to who ever finds the tapes. As the SAN snapshots can't be encrypted this means that we have to rely on the encryption process within the tape backup vendor who may or may not be doing encryption correctly, and they may or may not put the keys in the same place as the backup. While SQL Server doesn't have an encryption option (other than TDE) as a native feature there are several third party backup products which can encrypt the database backups as they are taken which are known to be secure.

In conclusion, I usually recommend that my SQL Server clients do not use database snapshots. Hopefully if you are being pushed into using SAN based backups like this person was, you can use this as some reasons not to.

Denny

Read more blog posts from Denny Cherry at "Troubleshooting SQl Server Storage Problems."

Comments

Plain text