Despite the importance of data archiving to the vast majority of organizations, not all have data archives. There are many reasons why: Decision-makers may not understand the value of archiving or the difference between backup and archive, archiving may simply be too complex, or it may be deemed too expensive.
Yet any company with more than about 25 terabytes of data needs a data archiving strategy, says George Crump, former principal analyst of StorageSwiss and now chief marketing officer at StorONE.
"It's not about saving money on primary storage; it's because you have to be able to prove retention, and that will become increasingly important as regulations like GDPR [General Data Protection Regulation] and CCPA [California Consumer Privacy Act] become more in force," Crump says.
There are other benefits to data archiving, including the fact that archived data typically is stored on lower-cost tiers of storage. Plus, archiving helps prevent data loss.
Another reason is as protection from ransomware. That's because, generally speaking, archived data is more difficult to access. If, over the course of time, a company archives 300TB of its 500TB of data and is then hit by ransomware, only 200TB of data is exposed.
Data archiving also keeps live data sets smaller, which makes it easier and faster to work with. For example, if you typically only search for transactions performed the last year, keeping 10 years’ worth of transactions in your live system will slow you down and will cost more.
Here are some tips for creating a data archiving strategy that works for your business:
Know what you have. Before you can archive anything, you have to know what data you have. There are plenty of tools available to help do this, but many are platform-dependent. If you have a Windows file server, for example, it might require a different tool than a NetApp device. There are agnostic tools, however, including those from SolarWinds and Clear Technologies.
Think it through before committing to anything. Before you buy anything, understand who is going to use it, what data will be archived, data access frequency, and how the archive will be updated, accessed and controlled.
Determining how often data will be accessed is critical because it may dictate the type of platform you choose and the response time you need, says Cindy LaChapelle, a principal consultant at ISG, a technology research and advisory firm.
"If the archive is operating in the cloud but all of the original copies of the data are in a data center and then recalled to that cloud-based archive, there might be latency issues you have to worry about," she explains. "So there are a lot of technical parameters you need to test based on who will access the data, how often and what the requirement is for recovering that data."
Next, assign all growing data a retention schedule based on how long it must stay in the live system. "For example, if you have credit card payment transactions, you should base retention on how many months after a transaction a customer can dispute a charge, and add some contingency," explains Gi Singh, director of technical services at Rapidev, a rapid development company based in Australia.
When determining a retention schedule, don't assume, he adds. Instead, talk to data users in all areas of the organization to come up with a retention schedule, and get their sign-off before implementation.
For example, the marketing group may have images, video and audio to archive, but only the marketing group will know how quickly it needs to be able to recover that media, and how long the media should be available to the department before being archived. Therefore, the IT group must work with the marketing group to create the technical solution. The same goes for other areas of an organization: legal, finance, etc.
Pick your poison. There are three basic options when it comes to data archiving: You can do it yourself, use software that identifies and moves the data for you, or use software that identifies and moves the data and sets up a link back to that data.
- Do it yourself: With this approach, your IT staff can develop a database using a PowerShell or Python script. Then it's just a matter of identifying the data and issuing a move command, making sure, of course, that you back up the data first.
- The middle road: Use a software solution to analyze data across environments, automatically identifying and moving data for you, based on your policies and requirements.
- The full Monty: These solutions do it all: automatically identify and move data, and include automatic recall if needed. That means that archived data can be re-accessed both as files or as files or objects in the cloud. This makes recovery very simple.
The option you choose depends on many factors, including the skill set of your IT professionals, the amount of money you are willing to spend and the features you need.
"As you move down these categories, it gets more and more expensive, and becomes more complex from a design perspective," Crump says. "So if the company has the skill set to write the script, the manual approach becomes appealing. If they don't, they have to decide between the second and third approach."
Deciding between the two automated approach comes down to how often you really expect to need to recall data — and that's hard to figure out. Crump says he often suggests comparing a snapshot of the data one month apart. Generally speaking, there is no change, he adds.
You don't have to break the bank. Decide how close to manual you are comfortable with; in general, the more manual, the cheaper.
"About 30% of the time, I've seen companies decide to start with a manual approach and, if it becomes more work, be willing to move to a more automated solution. That's a fine strategy," Crump says.
And be strategic about the amount of storage you buy because you may not need to buy as much as you think. For example, if your organization has 500TB of data, an analysis might show that 300TB of that data has not been accessed in more than a year. When a project comes up that requires 50TB of storage, simply moving the oldest 50TB into the archive, releasing the needed capacity without spending more money.
Revisit your archiving strategy often. Retention policies, business priorities, security concerns, government regulations and technology change often, and your archiving strategy should keep pace.
"If an organization created the data archive many years ago, it might not even have retention policies around the data. But today, there are regulations about deleting personal data within a certain time frame," says LaChapelle. "This could affect a lot of data sets in your archive, and if it hasn't been classified that way from the beginning, you might have to do some reclassification of your archive data."
When evaluating your current data archiving strategy, make sure to ask these questions, Singh says:
- Is the data secure?
- Is archived data persistent?
- Can it be accessed or restored if and when the business requires?
- Has the system or use cases changed?
- Has the compliance requirement changed?
- Have the costs changed? For example, is it now less expensive to archive in the cloud than in-house? What about in the next seven years?
- Is documentation up-to date?
Creating a data archiving strategy doesn't have to be overly expensive or complex. It's just a matter of doing the research. The payoff, in compliance, security and peace of mind, is more than worth it.