DataOps describes the processes and technologies designed to cost-effectively deliver timely, quality data to the users and applications that require it. It aims to do that by replacing what are often rigid, fragile custom links between disparate data sources and data consumers with well-defined, easily usable and automated processes for continuously updating, integrating and transforming data into the required format and sharing it.
DataOps uses principles from Agile development and DevOps (which combines software development with operations to speed software development) to “reinvent data management as an automated, process-based IT service that can empower both data professionals and downstream consumers spanning (business intelligence) users, data analysts, data scientists, and business users,” according to a June 2021 report from market researcher Omdia.
Among the aims of DataOps, according to the DataOps Manifesto, are “to satisfy the customer through the early and continuous delivery of valuable analytic insights from a couple of minutes to weeks,” “welcome evolving customer needs, and … embrace them to generate competitive advantage” and deliver reproducible results by “versioning” – tracking the states of everything from data to hardware and software used to deliver each data set or analytic result.
DataOps is driven by growing needs for real-time and predictive insights, as well as the increasing use of artificial intelligence that requires large amounts of data to identify trends and make predictions.
Some of the underlying functions of DataOps can be provided by legacy tools such as master data management (MDM) platforms (which assure the accuracy and integrity of data) for hybrid and multicloud deployments, by newer tools from cloud providers, and by emerging disciplines such as AIOps that use artificial intelligence to automate the deployment, monitoring and optimization of IT resources.
How Does DataOps Work?
DataOps first requires visibility into the state of enterprise data assets and which are being used, or could be used, to generate business insights. This visibility is often provided through the use of data hubs or data catalogs that tap metadata – data about the data – to help data managers and users easily understand and access large amounts of enterprise data.
With this knowledge, data managers can create scripts and application programming interfaces (APIs) that automate the collection, validation, integration and analysis of data from multiple sources. Ideally, these automated processes allow users to access data analytics “as a service” much like they use other applications in the cloud. They can also automate the detection and response to unexpected usage demands, configuration changes and errors, as well as accommodate new sources of data such as from sensors on the internet of things.
“When a consumer like a business analyst says they need a new query in SQL or Tableau, it doesn’t require a data engineer to create a query in the system to, for example, join two tables together,” says Bradley Shimmin, chief analyst, AI platforms, data and analytics at Omdia.
DataOps also requires data scientists to work more closely with business users to understand their needs, and for those business users to describe those needs so data scientists can create the right “data services” for them. Creating DataOps centers of excellence in the business units helps non-technical data owners “understand the importance and value in the nuances of working with the data such that they can take eventual ownership and stewardship of the data they’re working closely with,” says Shimmin. It allows them input into important DataOps processes such as the design of data catalogs, he says.
What Are the Benefits of DataOps?
The greatest benefit of DataOps is faster access to the ever-changing variety of data and types of analytics needed to meet dynamic business challenges. “The use of metadata to describe data assets and delivery of data via application programming interfaces (APIs) can help enterprise practitioners unify disparate data stores across multiple cloud platforms without having to physically move, replicate, or virtualize data,” according to the Omdia report. “This approach will allow businesses to build a more central and complete picture of their business without having to disrupt existing infrastructure investments.”
DataOps can “democratize” access to data to a broader range of business users and reduce the need for hard-to-find data scientists to develop custom queries or integration to meet each new data requirement. It can also prevent seemingly minor changes to data sources, such as how a semantic model defines a date string, from compromising the results of a downstream system such as an AI application by automating the testing and remediation of potential problems, says Shimmin.
What Are the Drawbacks of DataOps?
Done wrong, DataOps can create hard to share silos of data as business units or departments build their own data hubs in relatively low-cost public cloud platforms without following enterprise standards in areas such as security, compliance or data definitions.
It also requires purchasing, implementing and supporting multiple tools to provide everything from version control of code and data to data integration, metadata management, data governance, security and compliance, among other needs. Tools supporting operationalization of analytics and AI pipelines for purposes such as DataOps usually have overlapping capabilities that make it even more difficult to identify the right product and framework for implementation, says Gartner analyst Soyeb Barot in a January 2021 report.
DataOps implementations can also be hobbled by, among things, the over-reliance on fragile extract, transform and load (ETL) pipelines; a reluctance or inability to invest in data governance and management; the continued explosion of data to manage; and integration complexities, according to the Omdia report. For such reasons, Shimmin estimates that fewer than one in five enterprises has successfully implemented DataOps.
Examples of DataOps
IBM Cloud Pak for Data provides tools for data integration, data preparation, replication, governance, cataloging and quality, as well as MDM. IBM claims Cloud Pak for Data support and integration with its IBM Watson Knowledge Catalog helps customers “activate business-ready data for AI and analytics with intelligent cataloging, backed by active metadata and policy management.”
Informatica provides tools for data privacy management, data preparation, data cataloging, MDM, cloud-native data delivery services and data governance, which leverage its AI augmentation and automation platform, CLAIRE. The company uses CLAIRE “to great effect in addressing DataOps concerns such as continuous operations through self-healing routines, auto-tuning, autoscaling, and smart shutdown of services,” according to Omdia.
DataKitchen’s DataOps Platform supports cloud and hybrid cloud deployments in areas including data observability, automated testing, continuous deployment through orchestration and automation, a single management platform for multiple analytic pipelines, and self-service access to data and analytical insights. DataKitchen, one of the leading backers of the DataOps drive, positions itself as an orchestrator of the tools needed for DataOps that a business already owns, says the Omdia report.
GoodData’s data-as-a-service offering delivers metrics, analytics and other assets to business consumers via a rich API. This provides a single data service layer, says Omdia, “from which companies can build and deploy their own analytics apps in a flexible manner using a single source of truth that is secure and compliant with corporate data security and privacy mandates.”
Delphix’s Data Platform provides a programmable data infrastructure designed to automate data operations including continuous integration and delivery cloud migrations, and compliance. The platform integrates with platforms ranging from mainframes to cloud-native applications, automating “data delivery and access, whether on premises or in a hybrid or multi-cloud environment,” Delphix claims.
DataOps adopters will continue to seek centralized data governance tools and processes that help assure security and compliance but are easier to use than in those they have adopted in the past, according to Shimmin. While some DataOps vendors will focus on point solutions and others on overarching platforms, all have a strong incentive to ensure their offerings can easily integrate and share information with each other, says Shimmin. Such capabilities will lead, he predicts, to CloudOps giving way to “data fabrics” or “data meshes” that enable local business units to control, analyze and share their own data to meet pressing business needs while meeting enterprise security and compliance requirements.
To be successful at CloudOps, businesses cannot “consider data as a discrete resource akin to oil that must be centralized and processed one time only for incorporation into a refined, exhaustible fuel like gasoline,” says the Omdia report. “Rather, companies need to look at data as an ever changing and highly malleable form of energy, something that can move about freely, combining and recombining again to power a myriad of insights across the enterprise.”