With the arrival of SQL Server 2008 we gained access to a new feature called Change Data Capture (CDC). CDC enables you to constantly stream change data to a consuming process, most notably Extract, Transform, and Load (ETL) applications. In the SQL Server environment the ETL application used is of course SQL Server Integration Services (SSIS). As a side note, one thing that has bugged me since studying CDC in SQL Server 2008 over the last several months has been Books Online (BOL) statement’s about CDC as a SSIS feature. There are a few instances of such statements, for example “Integration Services includes a new technology called change data capture” in the topic ‘What’s New (Integration Services)’. CDC is a feature of the relational database engine in SQL Server 2008.
Moving on, the first obvious constraint in regard to SQL Server 2008’s CDC feature is that it is only available to SSIS applications that source their data from an OLTP system built on SQL Server 2008. In the field of implementing Microsoft BI there are many relational database systems we have to source data from that are not SQL Server based (and if they are SQL Server based there is a good chance it is not using the 2008 version). All major relational database systems use some form of a transaction log to record the history of its actions in case of rollback or hardware failure. I would imagine that the actual contents of each database vendor’s log is different, however they have the same basic requirement to temporarily persist database actions.
Microsoft has done a tremendous job in growing their share of the Business Intelligence market. Do you think adding such a relational database engine neutral CDC component would increase its BI (ETL) offering even more?
I have created a Microsoft Connect suggestion to see what they think as well, please feel free to add a comment to this blog or express your thoughts on the actual Connect suggestion.