4 Darn Good Reasons for Not Building Cubes on Transactional Systems

Executive Summary:

You can use SQL Server 2005 Analysis Services (SSAS) to build OLTP–based cubes on relational data models that aren’t designed for business intelligence, but you'll run into problems.

You can use SQL Server 2005 Analysis Services (SSAS) to build OLTP-based cubes on relational data models that aren't designed for business intelligence (BI). But just because you can, that doesn't mean you should. The sidebar "Reasons Why You Might Think You Should Build a Cube on a Transactional Data Model" lists several reasons why you might want to do this, but I've encountered four common problems with OLTP-based cubes that convince me that such cubes are a bad idea.

1. OLTP-based cubes are usually poorly designed for BI. If you try to build a cube on a normalized schema, it's likely to become a non-intuitive and overly complex mess because the cube is built on a data structure designed for an application, not for end-user reporting and analysis, and the result will be poor performance. Alternatively, modeling BI data to a star or snowflake design usually results in a well-designed cube structure with an intuitive structure of measures and their dimensions (for details on these design methods, see "Data Warehousing: Dimension Basics," page 32, InstantDoc ID 96813).

2. OLTP-based cubes usually are full of dirty data. By definition, when you build a cube against your transactional system, you remove the extraction, transformation, and loading (ETL) process that provides not only data movement but data cleansing and merging. Data can be bad no matter where it's stored, but the ETL process includes business rules, consistency checks, lookups, comparisons, and calculations that clean up data, merge it nicely with other data from multiple sources, and store it in a model designed for end-user reporting and analysis.

3. OLTP-based cubes don't handle history. A transactional system keeps what history its application requires; for instance, a transactional system might only keep the last 90 days of data, or it might overwrite (aka update) a record that changes. Therefore, you can't write historical reports or perform trend analysis. A relational data warehouse is often designed with snapshot fact tables that take historical pictures of measures (e.g., inventory levels, account balances) over time and track dimensional changes showing important trending of time-relative attributes. If your data source isn't tracking history, your cube can't reflect it.

4. OLTP-based cubes still have to be processed. Given that OLTP systems are transactional, building an OLAP solution on top often begs for near real-time updates. You can use proactive caching for realtime updates, but it can yield unexpected results for dimensions, especially large ones. Processing causes a full read of the relational tables supporting the dimension, which can cripple an OLTP system. If you forgo proactive caching, traditional processing will still include full reads. Should your transactional system have quiet periods, you might be able to get away with proactive caching, but implementing a traditional ETL approach using incremental extraction procedures creates less of an impact against the OLTP system.

Special thanks to BI experts at Solid Quality Mentors: Dave Fackler, Alejandro Leguizamo, Javier Loria, Jordi Rambla, Andreas Schindler, Craig Utley, and Erik Veerman.

Comments

Plain text