Off-Box Backups and Luke-Warm Standby Servers – Part II | ITPro Today: IT News, How-Tos, Trends, Case Studies, Career Tips, More

Following up on my previous post, when it comes to the need to create off-box backups, there are really only two (well, three) main reasons you’d want to do Off-Box Backups:

Three Primary Reasons for Off-Box Backups

First: Redundancy. As I pointed out in my last post: If you’re only keeping backups and data on the same server or hardware, then you’re DOING IT WRONG. From an elementary Disaster Recovery (DR) standpoint you always need a copy of your backups ‘mirrored’ to at least one other location. Drives can fail, RAID controllers can fail and take drives/data with them, and a host of other REALLY UGLY things can happen to data stored on a single host/server. Without off-box backups, then, you’re a sitting duck. And therefore, the reason, in this case to have off-box backups is a question of simple redundancy. (And, as I pointed out in my last post, an additional benefit of ‘off-box’ backups is that you can commonly store these redundant backups on less-expensive (and more voluminous) storage where you can actually, typically, keep copies of backups longer than you could on your primary server. But again, as I mentioned in my last post – you want to make sure you’re keeping backups locally on your primary server as well – to help avoid incurring the cost of pulling backups over the wire WHEN you need to recover. So, in this case, think redundancy of your data.

Second: Closely related to the first reason for why you’d want to keep backups in off-box locations is the simple fact that sometimes entire servers can fail. A Windows Update, or the addition of a driver might render a box completely non-responsive. In which case, trying to get backups off of that box, or it’s RAIDed HDs is going to be nothing short of a nightmare. That, and the point of this series of posts is to describe how to effectively set up ‘luke warm’ standby servers. So, in that case, knowing that you need a redundant location for your backups, and knowing that you might need a redundant HOST or server for your databases, this second ‘reason’ for wanting or needing redundant, off-box, backups plays in very well with the notion of why you’d want or need off-box backups.

Third: Also very closely related to the notion of failure (are you noticing a trend?) is that IF you can lose RAID arrays and/or entire servers, it’s also possible that you might lose entire data-centers. Natural Disasters and a host of other nasty things can and do happen to data. Yes, these kinds of scenarios are pretty rare. But what happens if all of your data is either lost due to fire/flood or ends up being submerged or inaccessible for 2 weeks? To address cases like this, organizations need what I like to call a ‘smoke and rubble contingency plan’ where they’re keeping some sort of backups off-site.

Getting Your Backups Off-Box

How you manage to get your backups off-box really depends upon a huge number of factors, such as how far you’re pushing data (are you pushing it to a local server in the same office/subnet, or is it going off-site to the cloud or another data-center?), and what kinds of infrastructure and connectivity you have at your disposal. Security of the data (i.e., who can access your backups and/or potentially peek at them going over the wire) is also another essential concern. So too, of course, is a question of determining just how important this data is in the first place – which is determined by ascertaining how long business can operate WITHOUT access to this data and how expensive it is to business to LOSE some of this data once it’s been recorded. (And figuring these details out (along with quantifying them) is facilitated by means of clearly establishing RTOs and RPOs.)

Local Off-Box Backups

For simple, local, backups that achieve basic data and host redundancy, you’ll just need simple backups. The destination of these backups can range from targets as diverse as simple File Shares where ‘dumb’ copies of backups are securely stored, or range clear on up to stand-by servers running SQL Server (or where SQL Server could be installed in a pinch) as a means of firing up a failover option should primary hardware fail. (And again, there are a HOST of other options that will provide you with MUCH better recovery-time in the case of disaster – such as High Availability solutions like Log Shipping, Mirroring, and (in some cases) Replication – but I’m not talking about those here. Instead, I’m talking about extending basic DR/Redundancy practices into the notion of giving yourself an additional option for failover if/when that makes more sense than launching into full-blown HA solutions. Or, of course, in cases, where you want additional coverage IN ADDITION to your high-powered HA solution.)

To copy backups from your primary host and backup/redundancy locations, you can use a host of different technologies and solutions. Many solutions for this kind of simple file transfer are available directly from within Windows itself – such as XCOPY/RoboCopy and even Distributed File System Replication. It’s also possible to use a host of different utilities and third-party offerings that will ‘wrap’ XCOPY/RoboCopy commands via GUIs and so on. Personally, I’m a huge fan of SyncBack. It’s dirt cheap to deploy out into the wild, and I’ve been using it for years on my local network for backup purposes. I’ve also got it set up with a couple of clients as a means of both copying files from a redundancy standpoint and as a means of aggregating files into centralized areas so that off-site backups solutions can then push them off-site and so on. I also like how well SyncBack interacts with Volume Shadow Copy as well – a big win in environments where there’s a lot of disk activity going on (even though the use of shadow-copy incurs more resource usage, it does so to help avoid problems with contention that might otherwise ‘break’ or crash backup-copy operations).

Whichever technology you end up using, just make sure you keep tabs on a few things:

Security. If your data contains any sensitive data, then make sure you’re treating it accordingly. That might just mean proper permissions at the target, but it might also mean that you need some sort of encryption of that data over the wire.
Timing. For the purposes of creating off-box/redundant backups, I typically recommend that you’re just having some sort of process ‘wake up’ every 5, 10, 20, whatever minutes and copy any files from your Primary to Secondary that haven’t already been copied. You’ll have to juggle the potential load that this puts on your system AGAINST how regularly you want your off-box backups copied. Typically, there’s NOT much of a hit to the underlying system resources to pull these copy operations off (or there shouldn’t be). But, if you’ve got a highly concurrent system with lots of disk activity, you MAY want to take a bit longer between pushes.
Timing vs RPOs and RTOs. Whatever interval or approach you end up using, just remember to ensure that it also works in terms of RPOs and RTOs (not just in terms of system performance – which many of us geeks tend to focus on a bit too heavily). In other words, if you’ve got an RPO that says your system can tolerate up to 15 minutes of lost data in the case of a server crash, and you’re doing transaction log backups every minute, then you’re just BARELY staying ahead of your RPO. Because IF you’ve then got off-box backups running every 10 minutes, you JUST MIGHT (and probably will) run into a case where your primary box fails, and the LAST viable, off-box, transaction log backup that you have is actually greater than 15 minutes old – meaning that you could actually have lost 20 or 25 minutes worth of data.
Longevity. As I’ve mentioned a few times now, try to do more than merely MIRROR the backups your keeping on your production system on a secondary box. Typically your secondary box/target should or will have MORE storage space. If so, put that storage to use and tweak your replication/synchronization tool to keep files at the destination server after they’ve been deleted at the primary. So, for example, if you’re taking full backups nightly, and keeping them (along with transaction logs) for 2 days on your primary server, see if you can keep files on your destination/copy server for a week or longer – by telling your synchronization software to keep files at the secondary for x days after they’ve been erased or deleted at the primary. In this way you’ll give yourself a bit of additional wiggle room should someone figure out that 4 days ago some moron in accounting messed up a whole bunch of data and no one realized it until now. (Recovering from a situation like this WILL NOT be easy in most cases unless you’re using a third-party log reader agent – but if you only keep backups for 2 days, the window of opportunity is lost. If, however, you’ve got additional backups running for say, 7 days, on a secondary server, then you at least have the option.)

Off-Site Backups

To achieve data-center redundancy, you’ll need to push your data off-site. And, depending upon the size of your data (or backups) this is where things can get tricky, complex, and even ugly.

Larger Organizations with Multiple Sites
If you’re a larger business or enterprise you very well may have SAN replication technology and/or dark-fiber stretching between data-centers. When possible, you’ll obviously want to use this connectivity and synchronization as a means of achieving off- box backup. The problem, of course is that it can become ‘political’ or complicated. So, the only thing I can advocate here is to acknowledge that some data (or backups) is more important than others and you may have to adopted a ‘tiered’ approach to off-site synchronization. (And don’t forget that RPOs and RTOs – along with SLAs – can be a great tool in wading through political concerns.) That, and IF you find that you’re in a situation where there’s lots of politics and lots of chefs in the kitchen in terms of orchestrating the synchronization of off-box backups, then not only does regular testing and validation of your backups continue to make sense (as always) but you should really undertake an aggressive policy of regularly checking/verifying backups in your remote locations as well – as it’s entirely too easy for something out of your control or area of expertise to ruin those regular backups – and regularly checking them is the only way to determine their validity.

Regular Tape Backups (Be Skeptical)
Many organizations take regular, taped, backups on a nightly or weekly (or whatever) basis and then ship them offsite to be stored in a vault somewhere. These kinds of backups ARE viable in many cases as part of a smoke-and-rubble contingency plan. However, there are a couple of things to be aware of.

Mismatched Priorities. One man’s smoke and rubble contingency plan may not be the same as another’s – meaning that weekly off-site backups may be good enough for the overall IT department simply because it was a good idea, 4 years ago, to do weekly off-site backups – even though no one has really thought about whether that’s good enough in the last 6 months. That, and there’s a huge difference between being able to lose a week’s worth of data on file-shares, or in active directory vs potentially losing a week’s worth of transactional data in Line of Business apps. So, just make sure that the RPOs afforded by any existing off-site backups being made at a system-wide level provide your databases the kind of protection needed.
Recovery Time Objectives. Relying on off-box, tape, backups also comes with another huge down-side: recovery time. Yes, IF these backups work, then you can recover in a smoke and rubble scenario. But don’t assume that just because you’ve got weekly, off-site, backups on tapes sitting in a granite vault somewhere that you’re going to be able to use those in a normal disaster. Getting access to this data in the event of an emergency is going to be SLOW and PAINFUL to say the least – to the point where you’re ONLY going to want to go this route if you’re truly in a smoke and rubble situation where you’ve lost an entire data center to a natural disaster or some other huge horror.
Recoverability. I’ve had far too many clients in hosted environment rely upon their hosting providers to make nightly taped backups as part of their DR solution. In my experience, far too many hosted backup solutions are sadly just a great way for hosting companies to make extra money by selling a phantom bit of ‘peace of mind’ to customers that, sadly, results in failure far too many times. From the standpoint of far too many hosting companies, nightly backups are something relegated to a junior-level tech who has to cut their teeth working with a complex 3rd party backup solution and storage options to the point where even if backups are done nightly, they may not be getting done correctly. And while some organizations and hosts actually succeed at regular backups for their clients, far too many don’t. And the worst part about this is that you only ever find this out WHEN you need these backups the most. As such, IF you’re going to rely upon hosted backups (or backups handled within your organization by other engineers/etc), then you need to skip right past all the SLAs and other paper-work involved and need to make sure you’re regularly testing these backups. Because without testing these backups on a regular basis, they’re just empty electrons.

Cloud Backups
For Small to Medium Businesses with just a single data-center, the creation of smoke-and-rubble backups can be problematic. Traditionally, these companies have only had the option of regular, off-site, taped backups hosted or managed by third parties (as I’ve just described above). And, in far too many cases, these backups are simply not tested adequately to provide the kinds of protection needed and, instead, far too often just become superstitious rituals performed regularly – but with no benefit.

Which, in turn, is why ‘the Cloud’ has become such a great option of late. The only problems of course, to deal with when it comes to doing off-site backups into the cloud become concerns of security, sizing, and throughput for the most part. And, in my experience, throughput is the biggest problem when it comes to off-site backups in the cloud because, obviously, it’s going to be hard, for example, to push (say) 100GB of backups up into the cloud on a daily basis for many organizations – simply because they’re not going to have the upstream needed for that.

That, and not all cloud-backup offerings are the same – to the point where some simply can’t keep up with large-ish amounts of data even when there’s a large enough amount of up-stream. Over the past few years I’ve worked with a number of different cloud-backup offerings to help keep client backups synchronized off site. And while JungleDisk used to do a great job of merely ‘exposing’ Amazon S3 storage as a local drive (which could then be used with something like SyncBack), I’ve found that in the last few years it simply can NOT keep up with decent demands for synchronization – to the point where it couldn’t keep up with 30GB of changed files/backups in a single day. Granted, you can only push so much data through a ‘straw’ at a time – and upstream is your biggest concern here. But, in terms of that same 30GB of ‘churn’ per day, whereas that would literally take LONGER than a day to push up to S3 servers via JungleDisk, I found that DropBox (of all things) was able to push up that exact same data in about 3-4 hours. (Well, the bulk of it – in the form of FULL/DIFFERENTIAL backups taken early morning – then the log-file backups would cruise up all day long – whereas JungleDisk would take > 1 day to push up the FULL/DIFFERNTIAL BACKUPS and never get to the T-Log Backups). And this, of course, was on the exact same hardware and systems.

So, if you’re going to look into off-box backups using the cloud, there are couple of VERY key things to look into:

Security. If you’ve got any HIPPAA or other highly sensitive data, this data has to be encrypted over the wire and at the destination. Period.
Throughput. Throughput is not only dependent upon your upstream connection, but upon file-checking, compression, and encryption algorithms used by cloud-backup software. This software is NOT all created equal. Some of it is great, some of it sucks. Consequently, the only way to tell is the push data up into the cloud and see how it works. Likewise, make SURE you test how quickly you can get access to this data when you need it.
Capacity. Some cloud solutions will let you put more data than you can fathom up in the cloud – limited only by your upstream and pocket-book. Others (those that I typically find to be the fastest – though not always), only allow 50GB or 100GB or some other SMALL capacity. If this is the case, you might want to look at ‘tiering’ your backups accordingly and evaluating how QUICKLY and EASILY you can get truly mission-critical data up and down into these providers while looking at other, slower, providers (on a less frequent basis) as a means of providing a contingency for LESS important data.
Accessibility. One thing I really hate about a lot of ‘cloud’ backup solutions is that they create a ‘vault’ or some other ‘nasty’ ball of backups up in their cloud – and the only way to get YOUR files and data back is to pull down that whole ‘ball’ of backups. Obviously, if you just need a single backup (of say 12GB) out of a ‘ball’ of 4 total backups (along with log file backups and so on), you’re then looking at pulling down ~50GB to get at that one backup. This is better, of course, than nothing. But commonly NOT good enough. So make sure to test this out when evaluating different offerings.

Of all of the cloud solutions that I’ve used over the years, one stands head and shoulders above the rest: DropBox. Which, frankly, is irritating. Because DropBox is a CONSUMER-targeted solution that comes with a few limitations. First and foremost, it’s got small plans (50GB/100GB) – though it does offer ‘collaborative’ plans for teams with great storage. The bigger issue, though, is that DropBox is NOT designed to run unattended. Instead, it’s specifically designed to run as a user-level process that kicks in when a user logs-in to their machine. Not at all what you want in a server-level backup – especially when this process terminates when you log off (or doesn’t spin-up until an end-user logs-in after a server is rebooted). Happily, there ARE work-arounds around this, but they’re a bit ugly to use. That said, I’ve got a couple of clients using DropBox, and the big benefits that this solution provides are that it’s cheap, it’s fast, and it’s VERY easy to get at uploaded files once they’re in the cloud. (One of my tasks for the next year is to test out how SyncBack 6.0 works in terms of it’s integration with S3 – because I’m guessing that it might end up being the absolutely best option out there just given how fast I’ve found S3’s servers to be, the fact that you CAN encrypt your data when stored on S3 (or when pushing data up), and due to the fact that you can get at data stored on S3 servers from basically anywhere, and because I’ve found SyncBack to be so reliable in the past.)

Up Next

Of course, as with everything else, merely PICKING a backup solution isn’t enough. If the backup solution is going to worth anything, you’ll have to regularly test whether or not you’re going to be able to pull down data and set it up as a failover source. And in that regard you’ll either be able to get the data back up, or you won’t. But even if you ARE able to get it back up and running, you then need to make sure that doing so is in accordance with your RPOs and RTOs. So, in my next post I’ll look at ways to put this redundant data quickly to use in the case of a disaster or emergency.

Comments

Plain text