SQL Server 2008, while not a revolutionary release, provides rich insight into your data for your organization’s business intelligence (BI) needs. I’ll take you on a brief tour of SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), and SQL Server Reporting Services (SSRS) enhancements, so you’re armed with information to help you make decisions about implementing SQL Server 2008 BI solutions. First, I want to point out two relational engine (query optimizer) enhancements that set the stage for better BI—partitioned table parallelism and star-join query optimizations (Optimized Bitmap Filters). Even though this tour is just a taste of the many enhanced BI features found in SQL Server 2008, it should be enough to give you food for thought when you consider whether upgrading to SQL Server 2008 BI is a smart move, based on your organization’s BI architecture and requirements.
Partitioned Table Parallelism
SQL Server 2008 improves performance on partitioned tables that reside on multi-CPU-based systems. The query optimizer can elect a parallel query execution plan on these hard systems to provide improved performing query and index operations. Fact tables are often candidates for partitioning in a data warehouse because they typically contain a few columns with a very large number of records.
Does your data warehouse have large fact tables residing on multiple CPUs? You can benefit by upgrading to SQL Server 2008 because there’s a new parallel query execution strategy on partitioned tables. SQL Server 2005 uses a single thread per partition parallel query execution strategy. In SQL Server 2008, multiple threads can be allocated to a single partition, thus improving the query’s response time. As of this writing, you can enable this functionality by setting the trace flag 2440, although this is expected to change when the product ships. Note that table and index partitioning requires SQL Server 2008 Enterprise Edition.
Star-Join Query Optimizations (Optimized Bitmap Filters)
The query optimizer uses bitmap filtering to eliminate rows from a second table based on values taken from the first table. Bitmap filtering is a common query filtering technique found in star-schema-based queries. SQL Server 2008 introduces optimized bitmap filtering. The query optimizer can now introduce bitmap filters dynamically in the query plan during generation, as opposed to just after query plan optimization, as in SQL Server 2005. Optimized bitmap filtering results in filtering from multiple dimension tables and bitmap filters are now applicable to more query operator types. Optimized bitmap filtering enables better performing data-aware house queries that reference the common star-based schemas.
Any BI solution includes extraction, transformation, and loading (ETL) of an organization’s data. ETL is implemented in SQL Server using SSIS. In SQL Server 2005, the SSIS pipeline execution engine doesn’t scale up to utilize more than one processor in a single execution tree. The SQL Server 2008 SSIS data flow engine can execute multiple components (threads) in a single execution tree. Overall, the 2008 SSIS engine is more stable and scalable. It eliminates the potential for deadlocks that occasionally occur in SQL Server 2005 SSIS when you execute packages with complex user data in large organizations.
Lookup transformation. SSIS can be used in a variety of scenarios, however, it’s most commonly used in ETL. One of the most common SSIS components used in ETL solutions is the Lookup transformation. The SQL Server 2005 SSIS lookup component used against tables with row counts of over a million rows occasionally causes a performance slowdown. SQL Server 2008 no longer has this limitation. You can perform a lookup against any data source by using the standard providers, which include ADO.NET, XML, OLE DB, and other data sources. You can even perform lookups against other SSIS packages.
The enhanced TxLookup transformation component of the SSIS package in SQL Server 2008 supports internal redundancy on the lookup chain. TxLookup also includes several other improvements over SQL Server 2005: There’s now a pre-charge query in addition to the cache-miss query. And for each cache-miss query, multiple rows can now be returned. The cachemiss query now has a separate connection manager. If you use a full or a partial cache query, SQL Server 2008 loads the hash table and uses the pre-charge query. However, if you use a no cache query, SQL Server 2008 behaves like SQL Server 2005 and uses only the cache-miss query. SSIS in SQL Server 2008 improves the performance of lookups to support the largest tables.
Data profiler. Good news for ETL gurus—SQL Server 2008 SSIS has a data profiler. Now you’ll have visibility into the source system data before you build your ETL solutions, and the ability to code, configure, and build based upon data patterns. With the data profiler you can generate source system metadata statistics, which you can then view using the standalone Data Profile Viewer. This viewer also displays candidate keys and data distributions. Data profiling has long been a requested capability of DTS/SSIS and the larger SQL Server product. It’s good to see a formal solution.
Following the typical progression in a BI solution, I’ve discussed the first stage—ETL and SSIS—and now we’re ready to look at creating cubes and mining models. One of SQL Server 2008’s many improvements to the SSAS architecture is Cube Designer enhancements.
Cube Designer enhancements. A critical component to SSAS is the practice of good cube design. The ultimate success or failure of your BI rollout depends on it. I’ll briefly survey what’s new with Personalized Extensions, Best Practice Alerts, the Dimensional Designer, the Aggregation Designer, and Named Sets.
You can use Personalized Extensions to create new SSAS objects and functionality, and then provide these objects and functionality dynamically in the context of the user session. You don’t have to create detailed specifications about where or how to find the extended functionality. You can share these new objects and functionality immediately with both end users and your fellow developers.
The Cube Designer now has a Best-Practice Alert functionality that spans all objects and is generated through Analysis Management Objects warnings. The warnings alert you when you violate design best practices or make logical errors in database design. You can detect potential problems with the design in a non-intrusive way because these warnings are integrated into real-time designer checks.
New and improved features for the Dimensional Designer include the Attribute Relationship Designer, a simplified and enhanced Dimension Wizard, and the Key Columns dialog box. You can use the new Attribute Relationship Designer in the Dimension Editor to easily browse and modify attribute relationships. The Dimension Wizard, which has been modified to align output with best practices, auto-detects parent-child hierarchies, provides safer default error configuration, and supports specification of member properties. In the new Key Columns dialog box, the enhanced Dimension Structure tab works with the Attribute Relationship Designer, making modifying attributes and hierarchies easier.
A new algorithm in the Aggregation Designer helps you create initial aggregations. This designer is optimized to work with usage-driven aggregations. You can view the created aggregations and add to or remove them.
Dynamic named sets are a new capability of SSAS 2008. A named set in SQL Server 2005 makes it possible to define a set of dimension members such as a set of the top 10 stores by sales. You define this set statically. You can then refer to this named set wherever you need to see the top 10 stores by sales. In SQL Server 2005, set evaluation occurs only at set creation. In SQL Server 2008, you can create dynamic named sets and define them to be evaluated every time the sets are used.
Performance enhancements. A major portion of the SSAS performance enhancements are in areas such as subspace computations, Multidimensional OLAP (MOLAP)-enabled write-back, and backup and storage.
Cube space is generally sparse, with values existing only for a small number of dimension intersections. Although SSAS in SQL Server 2005 evaluates expressions on complete space, and subspace computation is included with SP2, in SQL Server 2008 SSAS subspace computations are significantly improved. Multidimensional Expressions query performance has improved; SSAS deals better with cube space by dividing the space to separate calculated members, regular members, and empty space to improve evaluation of cells that need to be included in calculations.
The new MOLAP-enabled writeback capabilities in SQL Server 2008 SSAS remove the need to store writeback data in ROLAP storage mode. The new writeback MOLAP storage mode results in significant performance gains in cubes that leverage the writeback capabilities.
Finally with SQL Server 2008 SSAS backup compression, less storage is required to keep backups online. The backups also run significantly faster because less disk I/O is required. There are fewer restrictions on the size of the database, and the time required for backup and restore operations is significantly reduced.
Data mining. The next important SSAS element for any BI solution is data mining. SQL Server 2008 SSAS enhances data mining models by appending a new algorithm to the Microsoft Time Series algorithm. This improves the accuracy and stability of predictions in the data mining models. The new algorithm is based on the Auto-Regressive Integrated Moving Average (ARIMA) algorithm, and provides better long-term predictions than the Auto Regression Trees with Cross Predict (ARTxp) algorithm used in SQL Server 2005 SSAS.
By default, the new implementation of the Microsoft Time Series algorithm uses the ARTxp algorithm to train one version of the data mining model and the ARIMA algorithm to train another version of the data mining model. The algorithm then weighs the results of these two data mining models to provide the prediction characteristics you want. If you don’t want to use the default implementation, you can specify the algorithms that the Microsoft Time Series algorithm must use.
In SQL Server 2008 Enterprise Edition, you can specify a custom weighting of the algorithms to provide the best prediction over a variable time span. The improved Microsoft Time Series algorithm accepts data during prediction to allow for new business scenarios. For example, you can create a revenue prediction model based on averages across products, regional aggregates, or some other broad data set. You can then apply that model to the time series that shows the sales of an individual product. By applying the general model, you can take advantage of the stability and availability of aggregate data and customize prediction to the individual product. You can also train models by using multiple series, and then apply the models to new data in forecasting scenarios.
Now that we’ve covered what’s new with laying the BI groundwork with SSIS and building cubes and mining models in SSAS, we’re ready to review the new features and enhancements found in SSRS in SQL Server 2008.
Report Server engine. A report server is now implemented as a Windows-based service that hosts the Report Manager, the Report Server Web service, and background processing feature areas. The report engine improves supportability and the ability to control server behavior with memory management and infrastructure consolidation. Consolidating server applications into a single service reduces configuration and maintenance tasks. However, the Report Manager and the Report Server Web service applications continue to run independently within the single service. Both the Report Manager and the Report Server Web service can be accessed through URLs that provide HTTP access to these applications.
The report server includes an HTTP listener that handles all authentication requests directed to a URL and a port you define during server configuration. To provide the ASP.NET and Report Server Web service, the report server uses the new HTTP.SYS capabilities of the OS instead of IIS. The report server also has new management features to set a memory threshold for background operations and performance counters for monitoring service activity. I’ll briefly explore SSRS enhancements for report server deployment modes, report authoring, and report designing.
SSRS continues to expand its delivery options with the expansion and enhancement of Rich Text Format (RTF), Microsoft Office Word, and Microsoft Office Excel rendering. The improvement of the RTF component provides a method for users to define mixed formatting in textboxes and import marked-up strings of the text into a report generated from a database or other data sources. The Microsoft Office Word 2007 rendering extension can be used to export a report to a Word document without using a third-party tool. Finally, the Microsoft Office Excel rendering extension has been enhanced to support features such as nested data regions and sub-reports.
Report authoring. While improved report rendering is all well and good, better report authoring capabilities bring SQL Server 2008 SSRS to a new level of usability for developers, power users, and end users looking for easier report creation. Microsoft has been touting the Tablix data region type, which features fixed and dynamic columns and rows, arbitrary nesting on rows and columns, optional omission of row or column headers, and the ability to apply multiple parallel rows and column members within the same report.
Report authoring data visualizations now have better visual fidelity between formats and support for rich report formats, such as tables and matrices. Enhanced features include:
- Expression placeholder text. Expressions use placeholder display text in text boxes on the report design surface or in data regions.
- Expression-based parameter prompts. The Prompt property for a report parameter can be an expression.
- Processing-time variables. Variables that are global throughout the report or local to a particular group can be declared and referred to in expressions.
The Report Designer has also been upgraded with features such as new query constructs to return all instances in a recursive hierarchy. New query constructs support functions such as Rank and Top N. The tool has a new UI for obtaining grand totals, and it supports cross-joins, which are required for common analytic queries. SSRS business users have wanted a more user-friendly version of the BI Development Studio report designer tool. The SQL Server team responded by creating a separate standalone report designer outside of this tool.
Other enhanced features to help you design reports include:
- Entity hierarchies that provide a flattened analyticstyle metadata browser that presents all entities as a flattened list.
- Live data in design view that allows display of live data by using simple iteration of design-time elements.
- Instances in metadata browser that extend the metadata browser to include instance data.
- Filtering on the design surface that adds UI elements for defining basic filter conditions directly on the design surface.
- An interface that mirrors the Office 2007 products.
- Conditional formatting in response to customer recommendations.
- Standalone deployment that helps address issues that occur during Click-Once deployment.
- Built-in forms authentication that enables users to easily switch between Windows and Forms.
- Report Server application embedding that enables the URLs in reports and subscriptions to point back to front-end applications.
SQL Server 2008 BI—Is It for Your Organization?
As I’ve mentioned earlier, I see scalability and performance as the most significant areas of improvement in SQL Server 2008. Reports run faster, various queries can execute faster, and writebacks in SSAS are faster. A handful of brand-new capabilities, such as the Data Profiler in SSIS, may also make you think seriously about migration. Overall, SQL Server 2008 is an evolutionary upgrade which provides a better performing BI platform.