Garbage in, garbage out: The idea is true for many things in life, data among them. In an enterprise setting, junk data can cause both minor snags and serious risks.
Junk data is a term that refers to incomplete, incorrect or otherwise harmful data, which can come from many sources. It can include data that is generated by bad code or data that is in unusable format for an organization’s systems.
“Often, the most prolific source of junk data is data that does not accurately capture what it intends to,” said Paul Mander, senior vice president of solutions and technical services at mParticle, a customer data platform provider.
According to research from Experian, 83% of businesses view data as key to forming a business strategy -- yet also suspect that more than a quarter of their contract and prospect data is incorrect. In 2016, IBM estimated the annual cost of bad data to the U.S. economy was about $3.1 trillion, and it seems plausible that number has only increased in the five years since.
As organizations increasingly rely on data analysis to guide business decisions, the quality and reliability of data is all important.
“For most organizations, junk data is a bigger problem than they may realize,” Mander said.
Mander noted that junk data is typically rooted in issues related to data accuracy or application requirements. “Things seem OK on the surface, but when someone digs into things, problems within the data are revealed,” he said.
Filter Out Junk Data Early and Often
A cohesive data strategy helps organizations identify and deal with junk data as a matter of routine, before it ever becomes a problem. A data strategy can help with a quality analysis of existing datasets, of course -- but even more powerfully, it can keep junk data out going forward.
“Because junk data can have numerous root causes, the first step is to understand what is making your data junk,” Mander said. “Only then can an enterprise go about cleaning up the junk data and putting in processes that can prevent data from becoming junk in the future.”
All Hands on Deck
Even in organizations with dedicated data professionals and teams, data management is a shared responsibility. Mander recommends that organizations create a cross-functional team to develop its junk data management strategy.
Companies should also cultivate awareness of what good data is and how it supports business goals, both for the company as a whole and within specific departments. Through collaboration with teams in marketing, customer service, analytics and other departments, IT professionals can ensure the right data is being collected from the right places, using the right technology.
This approach can increase productivity for data professionals by reducing the amount of time they waste on “data wrangling” -- collecting, organizing and cleaning up data in order to make it available for analysis and other uses.
Joint efforts around data management are vital for organizations that have invested in data transformation initiatives, Mander noted.
“Junk data can often stall out digital transformation efforts,” Mander said. He added that data users may not trust the data if they can’t understand it or what it is used for.
Trash or Treasure?
It’s important that personnel throughout an organization understand the value of data. It’s also critical that everyone can differentiate which data is valuable and which is junk and designate the data accordingly.
Whether or not a particular dataset is valuable may vary from department to department, so it may be necessary for organizations to develop different approaches specific to departments or use cases.
What type of junk data is the most difficult to manage? Tell us in the comments below!