Prior to SQL Server 2012, Microsoft DW & BI solutions relied on a combination of tools and techniques within and without the Microsoft stack to provide the basics of enterprise information management (EIM). EIM is a term that encompasses the area of data & knowledge management and is a very important of a successful large data warehouse and BI implementation. Typically, EIM is meant to include data management capabilities like data integration, data cleansing, master data management and data profiling. Without cleansing data, profiling and insuring data integrity during integration to a data warehouse, your business intelligence solutions are exposed to the rules of garbage-in, garbage-out and a lack of adoption or trust from your users.
SQL Server has long had strengths in data integration from the old DTS days through to the newest advances in SSIS in SQL Server 2012 (the new deployment model and CDC integration being 2 of my favorites in 2012). Data Profiling came along in SQL Server 2008 as part of SSIS and does key actions for you as an ETL developer to profile the data sources before you make decisions on key attributes, source fields, transformations, patterns, etc:
There has been a form of data quality in SSIS that you could use with the SSIS Fuzzy Lookup tasks that use the SQL Server data mining engine to produce an effective data integration flow that included the ability to match data and fields based on lookup algorithms with a nice set of controllable parameters:
What was missing was a data cleansing and master data management solution that provides data quality throughout the process and throughout an organization that can be governed and owned by the business community. That is, people in your company that are closer to the actual data and don’t have to express the business entities to BI developers, typically losing context and meaning through translation.
This is where the EIM stack now included in SQL Server 2012 complete this picture. Master Data Services (MDS) came to us in SQL Server 2008 R2 and the SQL Server 2012 MDS version is much more complete and improved. And MDS continues the Microsoft march toward Excel as the primary BI consumer model tool which now enables the master data management (MDM) community’s role for business and data analysts: the data steward. This role within your business can now use Excel or a much-improved browser-based UI to govern the business data assets.
Data Quality Services (DQS) is a brand new addition to the stack that takes the idea of fuzzy matching and marries it together with the idea of master data management by making data quality a first-class citizen in your BI project. Microsoft has made this possible by including the software that was purchased from Zoomix a few years back and now bundling that into the SQL Server 2012 media. DQS, therefore, is an established tool for data quality that allows data domain experts to interact with a client tool (see screen capture below) instead of needing to go through
T-SQL, SSIS or database procedures:
The terminology and configuration of DQS is geared toward ideas like data domains, knowledge base and projects, which allows BI developers to offload the data management piece to data experts. Data quality and matching projects end up looking something like what you see below, which enables the data to be cleansed before you need to rely on fuzzy lookups, thereby increasing the quality and success rate of your business intelligence solution, which of course will benefit your company’s bottom line and top line.
The final mile of a Microsoft BI project in SQL Server 2012 with these EIM tools is integrating the data quality project work from the data stewards and data experts into your ETL workflow in SSIS. SSIS 2012 includes a transformation that integrates those data quality domains into your data warehouse. Below is a screenshot of a configuration for the DQS Transformation Editor. It is very straightforward and essentially creates a connection into the knowledge base that the domain experts have built up around the data sources:
One more thing I’d like to mention from DQS that is really nice: it comes fully ready to integrate with 3rd party reference data providers through the Azure Data Marketplace where you can use data from providers like Melissa and Infogroup to add additional quality checks against business names, addresses, phone numbers, etc. from data syndicators.