Data is the key to understanding customer preferences, improving business processes, making effective decisions, and anticipating future demand. Successful businesses know this and have developed effective ways to manage data.
But today, as data is subject to an ever-growing body of regulations, businesses need to do more than simply manage it – they need to govern it.
What Is Data Governance?
Data governance is an umbrella term for the processes, policies, roles, and metrics that determine how data is collected, stored, processed, and eliminated. Organizations that develop data governance programs cans better manage and control their data, ensuring that it is consistent, accurate, and available to qualified users.
Effective data governance is also critical for complying with data-focused regulations, especially data privacy laws. Following in the steps of the EU’s General Data Protection Regulation, several U.S. states have introduced privacy laws, with more states poised to do the same. Existing regulations include California’s Privacy Rights Act and Consumer Privacy Act, along with similar regulations in Colorado, Connecticut, Utah, and Virginia.
In addition, because many organizations today anticipate incorporating artificial intelligence into decision making, they must make efforts to comply with emerging AI regulations. The standard-bearer is the EU’s AI Act, which aims to prevent potential data misuse and privacy violations. Acts like these depend on organizations adopting strong data governance practices.
Clearly, every company today must have a data governance program. Lack of one can cause data inconsistencies, complicate data integration efforts, and create data integrity challenges. These issues can lead to a slew of negative outcomes: reputational damage, fines for noncompliance, reduced efficiency, and, of course, missed opportunities for business growth.
It all boils down to this: While data governance used to be a nice-to-have function, it’s now a must-have function (except maybe for the smallest mom-and-pop companies, although even that’s debatable).
How To Start a Data Governance Program
The first step in starting a data governance program is to appoint the right leader. In some companies, that’s the chief data officer or chief risk officer. In others, it’s the CFO, CIO, or chief compliance officer.
Whoever takes that role should oversee a team that includes officials from every line of business and every function, from finance and IT to customer service and HR. The team should work together to develop a data governance framework. That framework should be documented, then widely distributed throughout the organization. Training should ensure that employees understand the data governance framework and how it works.
The next step to choose a management structure for the data governance program. There are three basic ways to do this, said Bob Seiner, president of KIK Consulting & Educational Services, which focuses on data governance. The first is the “command and control” option, a top-down approach that assigns employees to clearly defined data governance roles. The second is the “if you build it, they will come” approach, where a company builds data governance and training programs and makes them available to employees.
But Seiner is a proponent of the third approach: non-invasive data governance. This approach starts by assuming that an organization is already governing its data – but not in any formal sense. The organization would identify which employees are currently governing data, then formalize their roles so that those employees could be tapped when necessary. For example, if an employee is the subject-matter expert in customer address information, the team will note that, so when there is a question about that topic, that employee will get involved.
“The idea [behind the non-invasive approach] is that there is already governance taking place in every organization, although it may be very informal, which means that it’s not very efficient or effective,” Seiner said. “If we can take whatever existing levels of accountability are already there, we can formalize them, and we’re not necessarily giving people more work than they already have.”
Tools Can Help Pave the Way
Data governance software products can help move processes along. While these products alone won’t get you to the finish line, they can take some of the drudgery out of many of the tasks.
In most cases, the most valuable tool is a data catalog, which collects metadata on data sources across the organization, including business intelligence reports, data sets, conversations, and visualizations.
The keyword here is “metadata” – data about data.
“Data on its own without context only provides so much, and metadata is that context,” said John Wills, CTO of Alation, a data catalog company. “[Metadata] gives you the who/what/when/where/why about data. If you don’t have the descriptions, you’re missing huge parts of knowledge.”
With metadata as the centerpiece, a data catalog helps organizations visualize how data is connected, which is called data lineage. Core capabilities of data catalog tools include searchable data and conversations, curation, analysis, glossaries that define business terms, and metadata management.
As organizations use data catalogs, they essentially build a knowledge graph of all connection points from the source to the destination, explained Jay Militscher, head of the data office at Collibra, another data catalog company. All that data winds up being very valuable for different roles in the business, whether those employees are managing, feeding, or consuming the data, he added.
At the same time, it’s important to understand that data governance, at its core, is not a technology solution. In other words, you can’t solve data governance with software. While software can certainly help, it’s more about instituting the right processes and controls and directing people’s behavior.
An Evolving Process
There is still plenty of manual intervention required for data governance today, even for companies that use data catalogs. Just setting up the data catalog can be done in a week, but it can then take another three months for the manual curation, labeling, and vetting of that system.
Users naturally want to incorporate automation into data governance as much as possible. Although tools like data catalogs can help automate parts of the process, experts believe that automation will drastically improve.
Wills said the industry is making big strides. Today, for example, many data catalog tools can collect everything in the catalog, augment it with metadata, and track it for different thresholds and quality. At the same time, these tools need more autonomous rules and monitoring in the background, along with a path for proactive fixes. Two years from now, Wills expects that data catalogs will include more automation capabilities that can pinpoint compliance issues and alert people to resolve them.
Michele Goetz, a vice president at Forrester Research, said that vendors need to get more creative with automating the process. “All of these vendors are focusing on the last mile of using machine learning for insights and digital operations, but a significant portion of those models are classifiers,” Goetz said. “Why aren’t we bringing that machine learning down to the scaling and understanding of the data?”
“Data scientists still have to do a lot of data preparation because the vendors are ignoring the data preparation portion of data management and data governance,” she added. “Somebody is going to make a ton of money with a platform that takes care of that.”
About the authorKaren D. Schwartz is a technology and business writer with more than 20 years of experience. She has written on a broad range of technology topics for publications including CIO, InformationWeek, GCN, FCW, FedTech, BizTech, eWeek and Government Executive.