This is the first in a series of articles that explores the role of data warehouse automation (DWA) tools in assisting data warehouse migration projects. The first of these articles introduces DWA concepts, methods and tools, and discusses the benefits of using DWA tools to assist in on-premises migration scenarios--that is, scenarios that involve moving from one on-premises data warehouse target to another. A series of follow-up articles will explore the benefits of DWA in more complex migration scenarios, such as hybrid-type migrations--from an on-premises data warehouse system to the infrastructure-as-a-service (IaaS) or platform-as-a-service (PaaS) cloud--along with cloud-to-cloud migrations. This last encompasses migrating a data warehouse system from one cloud IaaS/PaaS service to another, as well as multi-cloud deployments.
What DWA Is
Data warehouse automation, or DWA, describes the concept of accelerating a portion of the tasks involved in prototyping, designing, testing, deploying and maintaining a data warehouse system. Commercial DWA software tools incorporate usability features—for example, visual modeling facilities, rule- and machine learning-driven (ML) automation capabilities, guided wizards and user self-service tools--that purport to automate and/or accelerate many of these tasks.
In almost all cases, commercial DWA tools preceded the emergence of vendor-specific automation capabilities, such as the ease-of-getting-started and ease-of-use capabilities now exposed by most platform-as-a-service (PaaS) data warehouse services. Starting about 25 years ago, data warehousing practitioners identified a need for features to formalize the repetitive work of data warehouse design, development and maintenance to promote reuse, to automate recurrent tasks and, most important, to simplify the work of maintaining and changing the data warehouse. Thus, the raison d’etre of data warehouse automation.
How DWA Works
Think of data warehouse design as analogous to model-driven software development (MDSD)--and of DWA tools, in particular, as analogous to low-code development platforms (LDCPs).
In the first instance, the aim of MDSD is to capture and formalize domain-specific business knowledge for the purpose of encapsulating it into pre-built models. An MDSD tool uses these models as inputs to generate code in a supported programming language. This is one function of a DWA tool, which uses a logical data model as input to generate database-specific code.
In the second instance, both DWA and LDCP tools provide a visual development environment suitable for use by business analysts and subject matter experts. In LCDP, these experts work with developers to design pre-built models—that is, object-like components--which they can manipulate and assemble into applications and workflows. They then hand these proto-deliverables off to skilled software engineers, who focus on designing application or presentation logic, improvements to the overall user experience and other technical arcana.
The average DWA tool approximates the features of a combined MDSD and LCDP tool. For example, most DWA tools can use UML-based models to generate SQL or DDL code optimized for specific data warehouse targets. Moreover, almost all DWA tools incorporate visual, self-service facilities business experts can use to design, test and modify the data warehouse data model itself, as well as the hundreds or thousands of denormalized views—such as star or snowflake schemas--which comprise the business intelligence (BI) presentation layer.
Some DWA vendors describe their tools as “data-driven” as opposed to model-driven. This typically denotes alignment with the bottom-up practice of dimensional data modeling, as compared to the top-down practice of logical data modeling. For the purpose of this article, think of the two as the same: Users design models that encapsulate business logic; the DWA tool translates these models into code, which the database executes to create physical structures. Developers, working with business experts, design ELT jobs that populate these structures.
Most DWA tools also double as ELT--or extract, load, transform, as opposed to ETL--design studios. This is because in the DWA model, the data warehouse itself transforms and conditions data. Centralizing data integration processing in the data warehouse simplifies the task of designing and maintaining data transformation and data conditioning logic. It likewise simplifies porting or migrating this logic to dissimilar data warehouse targets.
What DWA Does
Business experts or developers tend to perform some of the same tasks over and over as they scope, design, test and deploy a data warehouse system. These recurrent tasks include:
- Connecting to, exploring or profiling data in upstream repositories
- Formalizing business rules and encapsulating them in logic
- Designing, versioning, testing and scheduling the data transformations and data conditioning jobs used to populate the target data warehouse
- Creating logic to validate these transformations and conditioning jobs
- Assessing the impact of changes to upstream data sources or the data warehouse itself
- Designing, testing and deploying new denormalized views to support BI tools
- Generating documentation and metadata
Similarly, the lifecycle process of managing and maintaining the data warehouse involves many repetitive (and/or entails certain kinds of) predictable tasks, as with data warehouse design.
- Identifying and formalizing new entities, their attributes and their relationships
- Related: designing, testing, and versioning changes to a logical data model
- Modifying, testing and versioning denormalized views to support BI tools
- Changing and/or versioning business rules
- Changing, testing, scheduling and versioning the data transformations and data conditioning jobs used to update the target data warehouse
- Maintaining (and versioning) documentation and metadata
Ironically, data warehouse automation tools do not automate most of these tasks.
Instead, they provide a single context in which to perform them, along with different kinds of visual tools--visual data exploration and discovery; visual data modeling; guided, self-service features; and interactive, wizard-driven automation facilities--to help abstract their complexity.
DWA in Data Warehouse Migration
Some organizations opt to lift-and-shift (or “forklift”) their existing data warehouse system to run on a new target platform. Basically, this entails loading data from the existing data warehouse into the new target system, replicating the structure of the older system. More conscientious organizations will use modeling tools (that is,, a logical data model) to generate new SQL/DDL code for the warehouse target.
A DWA tool does all of this and more. First, like a modeling tool, it regenerates SQL or DDL code optimized for the new warehouse target. Remember, several established database vendors have developed proprietary extensions to standard ANSI SQL. In addition, some vendors (IBM, Oracle and Teradata, for example) expose optimized database loading facilities. So far, so good.
The DWA tool also automatically regenerates existing views, indexes, models, stored procedures, metadata, documentation and other assets for the new database target. It does this irrespective of whether a migration scenario involves a movement from one dissimilar warehouse target (say, Oracle) to another (say, Teradata)--or from an older to a newer version of the same data warehouse platform, as with an upgrade from Oracle 12c to Oracle 21c.
To sum up, whether you are migrating between dissimilar data warehouse targets or upgrading from an old to a new data warehouse target, a DWA tool will automate the following tasks:
- Regenerate code to instantiate data structures optimized for the new target platform
- Bring forward (updating and/or deprecating as necessary) all native objects
- Port and schedule ELT jobs, data cleansing routines and associated validation logic
- Related: regenerate and/or re-sequence ELT code for optimal parallel execution
- Update, delete and/or create metadata and documentation
The DWA tool provides a facility to monitor the success or failure of these operations, too.
Another common scenario involves migrating from one type of warehouse data model to another. Take, to cite one example, Data Vault modeling, which makes use of novel data structures such as hubs, links and satellites. In part because of these novel structures, migrating from a conventional 3NF data model to a Data Vault 2.0 model is typically a non-trivial task. However, a Data Vault-certified DWA tool can assist with this, first by translating the 3NF logical data model into a Data Vault 2.0 model and, second, by generating optimized code that replicates Data Vault 2.0 structures in the target data warehouse.
Here and elsewhere, the DWA tool usually does not (and cannot) make the migration process turnkey. It can, and does, use automation to bootstrap this process, however.
Data warehouse migration is the kind of killer app that organizations do not always take into account when they are weighing a decision to license a DWA tool. However, if an organization adopts a multi-target DWA tool and is disciplined in using that tool as the primary means of designing, deploying and maintaining its data warehouse system, then its use of that tool should simplify its migration from one data warehouse target platform to another. It should also simplify migration from one type of schema (say, 3NF) to another (Data Vault 2.0).
DWA tools help to decouple an organization’s unique or handicraft assets—for example, ELT logic; stored procedures; star and snowflake schemas; metadata and documentation--from the specific requirements of a target data warehouse system or target deployment context.
For example, a DWA tool should make it easier for an organization to reuse the ETL jobs it originally developed for its on-premises SQL Server data warehouse with a new, dissimilar data warehouse target--say, Teradata, running in either the on-premises environment or in the cloud. Ditto for on-premises SQL Server to Snowflake’s PaaS data warehouse service, and so on, and so forth. (I plan to discuss the benefits and possible costs involved in using DWA tools to assist with cloud migration in a follow-up article.) In this way, DWA tools afford some protection against dependency on specific platforms--or on platform-specific technologies.
This is all with the caveat that the organization standardizes on its DWA tool as the primary means of data warehouse design and maintenance. Work that is not performed using the tool cannot be managed by the tool, which means it can neither be captured nor maintained for reuse--and, hence, will not be brought forward to the new warehouse platform.