Data Overload: How to Avoid Storing Endless Copies of the Same Data update from January 2021

Here are steps to reducing the number of copies of data to make its management more sustainable.

Karen D. Schwartz, Contributor

January 19, 2021

4 Min Read
file cabinets
Getty Images

You’ve thought this out, coming up with a foolproof system for data. You have a golden copy of each file that can’t be altered, stored offsite. You’ve also got a few additional copies for short-term recovery, disaster recovery and compliance. So you’re set, right?

Not so fast. Your DevOps team wants a copy, and so does your analytics team; your database administrator; and your operations, legal, marketing and security teams. Over time, branch offices and other business units may want their own copies. And then there are the naysayers — users who want their own copy because they don’t trust that the same copy of data can be used for multiple purposes.

Before you know it, you’ve got dozens of copies of the same file in different areas of the business. It’s confusing, expensive, hard to manage, and ultimately, not very sustainable. According to research from ESG, the typical data set is copied and stored an average of six times. Over time, many organizations can end up with a dozen or more copies of the same data. It's data overload. Here are some ways to avoid the problem:

Do the detective work first. The best way to reduce the number of copies to a manageable amount and avoid data overload is by knowing exactly where all copies are located, along with who owns each copy. Data management tools can provide that visibility. But make sure it’s a multicloud data management tool, advised Randy Kerns, a lead strategist at Evaluator Group. Another effective method of finding data copies is using a copy data management (CDM) tool, which typically uses a metadata approach to finding copies on both physical and virtual infrastructure. Once you know where everything is and who owns it, there are plenty of other tools and even manual processes to determine whether the copies are relevant or not, Kerns added.

Make sense of the chaos. The next step is finding a way to manage those copies: Move them, eliminate them, and most importantly, stop copies from being made, changed or propagated. CDM is a good way to do that. CDM creates a “golden master” copy of production data as well as virtual copies as needed. This helps keep tabs on the total number of copies. Advanced CDM solutions also automate the creation of copies by using the best technology for specific use cases. These tools, available from vendors including Actifio, Cohesity, Dell EMC and Unitrends, allow data sets to be used for multiple purposes, by multiple units, without the risk of modifying the data.

Storage consolidation can help. Consolidating storage encourages users to use the same data copy. “In the past, the mindset was that each application, business unit and use case got its own storage arrays and, therefore, its own data copy,” said Max Kixmoeller, vice president of strategy for Pure Storage. By consolidating, organizations reduce the islands of storage, which leads to fewer copies of data and avoids data overload.

Another option is deduplication, where common bits of data that might be the same across different volumes in a storage array are stored only once. “It’s about pulling the extra pages out of the book by getting rid of the extraneous copies that people may have copied to multiple sites,” said Michael Letschin, a principal technologist at Cohesity. “A global deduplication strategy can accomplish this." By reducing the number of copies you have, it is easier to find what you need.

Take politics out of it. When it comes to data, turf wars are fairly common, which has resulted in every unit wanting its own copy of data. One way to solve that problem is by appointing a data steward or, in the case of larger organizations, a chief data officer. With someone in charge of information management, it’s much more likely that the right tools, policies and automation will be implemented and enforced.

If you haven’t modernized your approach to data management, it’s time. With employees working from every possible location, on every conceivable device, copy data management will only become more complicated over time. That makes it more important than ever before to adopt cloud technology when possible and solidify your storage plans. “Take a step back and look at what a holistic data management strategy looks like for you,” advised Letschin. Typically, that means creating a strategy that encompasses every step of the data journey: when it arrives, where it is stored, how credentialed users can access that data without changing the golden copy, and making sure it’s protected and secure.”

About the Author(s)

Karen D. Schwartz


Karen D. Schwartz is a technology and business writer with more than 20 years of experience. She has written on a broad range of technology topics for publications including CIO, InformationWeek, GCN, FCW, FedTech, BizTech, eWeek and Government Executive

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like