I recently ‘inherited’ a couple of SQL Servers for a client – running SQL Server 2008 (Enterprise Edition). In the process of ‘taking over’ these boxes I’ve scrutinized a number of jobs and details – including backups. One thing that I found, interestingly enough, is that while some backups are being taken, they’re not using Compression.
With SQL Server 2008 R2 and above, both Standard Edition and Enterprise Edition pick up the ability to natively compress backups. But with SQL Server 2008 this was only available on Enterprise Edition. But, since these servers are Enterprise Edition I wondered why Compression wasn’t turned on. (I’m guessing the DBA before me either didn’t know about it, or maybe didn’t trust it?)
So, one of the first things I did was go in and turn that on (on one of the servers where I didn’t’ replace the DR plan ‘wholesale’ – as I did on some of the other servers).
Then, after this single change was put in place, I went through and analyzed the impact. Not only were backup sizes more than halved, but backup times were decreased as well. In fact, on one of smaller databases, backups went from 75GB in size down to 34GB in size, and went from taking between 22 and 28 minutes down to taking around 16-18 minutes instead. Likewise, for some of the other databases, I managed to save roughly 20 additional minutes of backup time.
Compression Yields Faster Backup and Recovery Times
None of this was really ‘news’ for me. In fact, I fully expected to see these results – both in terms of backup sizes and decreased backup times. But it was a nice confirmation of something that I just ‘instinctively’ knew and which I had tested a number of times in the (now, seemingly, distant) past. Or, in other words, this was a ‘fun’ little ‘validation’ of what I already knew and expected. And the benefit of regularly testing what you THINK you know about SQL Server can’t be understated.
Long story short, though, if you take just about any disk or disk-subsystem ‘out there’ and compare its ability to read vs its ability to write, you’ll find that it almost always does better at reads than writes. Consequently, if you think about backups, then anything you can do to avoid causing your disks to write out as much data as they have to read will give you a performance benefit.
And that’s exactly what you get with compression – fewer bytes written to disk. Because without compression, you’re effectively doing a byte-by-byte copy of your data-pages (which, in turn, is why corruption can be (and will be) transparently ‘copied’ into your backups). But when you throw compression into the mix, you’re still reading the same amount of bytes, but writing typically far fewer bytes to disk – resulting in better performance.
Then, as I like to point out to clients when talking about backup redundancy, you can also move backups across the network to keep duplicate copies with less expense and effort – because your backups are, again, smaller. And that, in turn means that you’ll have a quicker and easier time of pulling them back across the wire IF you have to when recovering from a disaster where your local backups aren’t viable or accessible.
Likewise, even though I haven’t tested this in a while, I’m pretty confident that restoring from a compressed backup will take less time than recovering from a non-compressed backup – simply because there are significantly less reads involved. (But, again, I haven’t fully tested or verified this of late – so I’ll post here if I find anything contrary to what I’m expecting when I do test that in hopefully another week or so.)
But What About CPU Overhead?
Of course, compression isn’t free – and comes at the expense of additional CPU overhead. But, in my experience, the vast majority of SQL Server deployments either have CPU enough to spare, or SHOULD have if they were correctly tuned. Or, in other words, while there are some high-CPU-dependent SQL Server workloads out there (typically in the sciences – where they do lots of number crunching or math), most OLTP systems usually have enough CPU to spare for compression.
Moreover, since large-scale FULL/DIFFERENTIAL backups are typically scheduled for off-peak hours, that typically means that there’s even more CPU to spare for compression.
Of course, if backups (hopefully taken during off-peak hours) require additional CPU overhead to kick off, what does that mean about having to recover from a backup during a period of peak activity on your server? Or, in other words, will the need for increased CPU be a liability? Which is hard to answer – because it’ll depend upon your workload. For example, if your server has either a single DB or just a couple of heavily used database, then it might be safe to assume that the LOAD that you LOSE when those users are kicked out of a database being restored could/would/should? be enough to give you some extra CPU for decompression.
On the other hand, if you’re dealing with well-tuned multi-tenant system where there might be lots of databases under heavy load and one of them needs to be recovered, it’s feasible that you might incur some decent CPU pressure while recovering a crashed or taco’d database.
But there’s also the line of thinking that says that CPU pressure incurred while recovering from a disaster is water under the bridge – as many DBAs (me included – in most cases) would rather throw as much hardware as possible into recovery to get a database back up and online – even if that might mean a bit of sluggishness for a few minutes for other operations.
Long story short: if actual metrics here (for recovery times and/or SLAs) matter to you – then you need to test these concerns out in your own environment – rather than just theorizing what could be going on.
Some Other Thoughts
Finally, while SQL Server’s native compression algorithms are pretty decent and provide great benefits out of the box, they don’t do anything to encrypt your data – which is key reason to pay attention to other 3rd party offerings that also do encryption. That, and the maintenance plans that SQL Server offers out of the box are tolerable at best but not very versatile or nearly as easy to interact with as some of the GUIs provided by most of the 3rd party backup vendors out there.
With all of that in mind, I did, however, find a recent blog post from Maria Zakourdaev to be very fascinating – simply because it provides a number of concrete metrics about various backup times and compression ratios .