During the past two decades, decision support systems and data warehousing have become expected components of the corporate information intelligence arsenal, providing reporting and fundamental analytics processes to support corporate operations. In the enterprise reporting framework, data sets are extracted from source systems; landed at a staging area for standardization, cleansing and reorganization; and then loaded into a monolithic data warehouse. Although its foundational architecture has become a de facto standard, there are a number of disruptive forces in the technology market that are rapidly influencing changes in the ways that organizations design, develop and deploy their analytics strategy.
Analytics democratization is the desire to provide access to reporting and analytics tools to a broad range of citizen data analytics.
Organizations will likely consider different approaches for adopting these technologies, although all alternatives will probably ultimately result in a hybrid environment that spans both on-premises and multiple cloud platforms. While some organizations will migrate their existing on-premises reporting applications to the cloud (also referred to as “lift and shift”), others will modernize their environment to address both ongoing and anticipated future data analytics needs. In this article we explore two facets of a future analytics environment that is deployed on the cloud: analytics and data integration.
Most, if not all, conventional data warehousing and analytics platforms support descriptive analytics. This encompasses most of the traditional operational reporting and diagnostic analysis that enables users to consider what happened via drill-down and discovery, and to analyze correlations and infer potential causality. The future data analytics environment must expand to incorporate a full spectrum of analytics utilities and capabilities, including:
- Predictive analytics, which uses data mining, machine learning and artificial intelligence techniques to develop models for predicting future behaviors.
- Prescriptive analytics, which provides recommendations for optimal outcomes of selected options based on predictive analytics. In other words, prescriptive analytics helps automate decision processes.
- Integrated analytics, which allows developed analytical models to be integrated within information flow to execute automated decision support and execution.
- Feature extraction and text analytics, which helps automatically identify and extract features from semi-structured and unstructured data that can then be used to fuel predictive and prescriptive analysis.
While traditional data warehouse architectures could be lifted and shifted to the cloud, there are newer data management architectures (such as in-memory hybrid databases and virtualized access to object storage) that are designed to exploit the use of scalable cloud resources as well as cloud host-native analytics utilities for integrating streaming data with machine learning and artificial intelligence algorithms and models.
Hybrid Data Integration
There are two aspects of “hybridization” in the evolving extended information environment. First, the enterprise is expanding beyond the traditional on-premises configuration. Selected migration of data and applications to one or more cloud platforms is creating a more complex hybridized computing environment. Second, information management and analytics applications are increasingly capable of ingesting and processing structured data, as well as semistructured data (such as XML or JSON documents) and unstructured data assets (such as freeform text or transcriptions of audio data).
This implies a greater need for enterprise-wide data awareness. As the size of the analyst community grows, each individual must be able to rapidly determine which data assets are available for use along with the appropriate metadata that guides the data consumer in data asset use. And, as the number of real-time streaming data sources increases, analysts will want tools to help quickly ingest and analyze these data feeds. The future data analytics environment must be able to accommodate integration across these two dimensions, incorporating the following capabilities.
- Data discovery tools provide analysts with insight about the contents of collected data assets and help to characterize and collect structural metadata as well as determine whether the data asset contains sensitive information that is subject to protection.
- Virtualized data accessibility reduces the need for copying data from one platform to another by allowing analytics applications to access data in place.
- Data pipeline orchestration tools manage the increasing complexity of simultaneous ingestion/processing/analysis applied to both static and streaming data.
- Data catalogs provide a repository for data asset metadata that alerts consumer communities about available data assets and how those assets can be used.
- Real-time data ingestion allows data analysts to develop applications that integrate batch data with continuously streaming data in real time.
Determining the need for modernization is the first step in establishing a plan for migrating the analytics environment to a hybrid cloud configuration. The choices presented to the system designer are vast and, in many cases, confusing. The best approach calls for a solid modernization road map and blueprint as a prelude to any migration plans.
Before embarking on the operational tasks of either migrating data/applications to the cloud or redesigning the analytics environment, survey your users to assess their reporting and analysis needs as well as their desires for future data analytics capabilities. This assessment will frame the requirements that will drive analytics architecture and subsequent technology selections.