Over the past 20 years, data volumes have grown exponentially. As a result, data management and governance capabilities are needed more than ever. To better understand this topic we spoke with Manish Jain, Vice President of Product Management at Hitachi Vantara.
As data volumes skyrocket, it seems that dark data—or data that is aggregated but unused—is also expanding, preventing many companies from capitalizing on data growth. Give us your perspective on this trend and how a solution like Hitachi Lumada can help.
Manish Jain: Dark data continues to be a problem with customers we talk to. We have several very strong capabilities within our Lumada portfolio that help resolve this problem and ultimately democratize data management. The first is data integration, or the ability to collect data from its source and put it into a data lake for analysis.
Other capabilities include data cataloging, discovery, and search functions. The real challenge with dark data is you can’t analyze what you don’t know exists. This is where our data discovery and search tools can help you find and publish high-quality data. This is important because data publishing is the end goal, right? It avails the right data at the right time to the right set of users—and that's what end users truly care about.
As more and more businesses use public clouds to streamline data analysis, it seems they can increase their flexibility and reduce the time required to move data across multiple clouds with solutions such as data rationalization, cloud connectivity, and pipeline portability. What’s your take on this?
Manish Jain: Our experience is that for most industries, including banking, financial services, IoT, utilities, and the public sector, large enterprise customers maintain a sizable portion of their data on-premises. In fact, only a portion of their data resides in the public cloud.
We’ve found the same clients haven’t widely adopted multi-cloud solutions. There may be silos within an organization that store data in multiple clouds, but typically they choose one cloud, such as Azure or Google. In helping customers with data discovery, we’ve noticed their business users are largely agnostic to where the data resides; they just need the data to be available within their desired analytics systems.
With this landscape in mind, we tend to focus on hybrid cloud. Once we establish the best place for the data to reside, we look to offer data virtualization. This capability offers a layer to virtualize where the physical data resides and provides a catalog to discover that data. By using a data virtualization layer, we provide a bridge over the hybrid cloud chasm between cloud and on-premises. Data virtualization allows you to search and discover data wherever it resides. This is the thrust of Lumada: using our data management tools to provide one seamless experience.
With the democratization of data, businesses are increasingly looking for ways to empower citizen data scientists and business analysts. What are the potential pitfalls of this approach?
Manish Jain: We certainly see interest in data democratization. Citizen data scientists, indeed, now have access to more data than ever before. But with this expanded data access comes the risk of exposing personally identifiable information (PII). Because of this, it's necessary to scrub the data of PII and other sensitive data. Technologies like sensitive data management and role-based access control (RBAC) can also help you find and secure this information.
While the goal is to provide greater access to analytics to drive better business decisions, care should be taken to avoid leaking sensitive data. This is another area where Lumada is adding immense value for our customers.
To explore the Lumada Intelligent DataOps Suite for yourself, visit https://www.hitachivantara.com/intelligent-dataops.