Materialize Your Views

Downloads
42331.zip

The next release of SQL Server, SQL Server 2005, will contain many features and extensions to make you more productive, especially if you're doing database-driven Web-application development. (For a preview of SQL Server 2005, check out the May 2004 issue of SQL Server Magazine.) However, Microsoft has delayed the final release of SQL Server 2005 until the first half of next year—and many production shops won't migrate to the new database system for at least a year after that. In the meantime, you can do plenty to optimize your SQL Server 2000 databases.

I'm a data modeler and design architect, so my instinct is to look at and optimize the table schema. If you're a regular reader of my column, Solutions by Design, you know I advocate strong table normalization, which is essential to maintaining data integrity when you're adding and managing data. The benefits of normalization are worth the extra overhead of the multitable joins required for retrieving data from the tables. But you can cut the overhead of joining four or five tables to extract data from a well-normalized database or of aggregating large amounts of data in a summary report. To accomplish this, you can create a view, and instead of having the SQL Server engine dynamically reconstruct the view each time it's used in a query, you can "materialize" it. Instead of a virtual table, your view becomes a physical table.

A view is really a derived virtual table. Its purpose is to enhance visibility of the data by presenting it in a more informational context and to control access to the data by hiding critical or sensitive data from users who don't need to see it. SQL Server "materializes"—or makes real—a view by creating a unique clustered index on it, so these views are sometimes called indexed views. Like a clustered index, a materialized view has user data associated with it.

Materialized views aren't new to the database world. Although indexed views were new in SQL Server 2000, they've existed for years in other database management systems (DBMSs) such as Oracle and IBM's DB2. Large-platform DBMS vendors developed the materialized view to enhance their data-warehousing systems. A data warehouse is technically data that you've integrated from several different sources into one large data store. This data store might house detail data from operational applications, summarized data from decision-support systems, or a combination of both. In a data warehouse, the data is usually summarized along several dimensions (e.g., time, location, product), then stored for aggregate query processing by OLAP and decision-support applications. Let's look at how useful a materialized view can be in environments other than the data warehouse.

Why Use Materialized Views?

If materialized views were intended to enhance performance in a data warehouse, why should you consider them for a transactional database? After all, when you design a transactional database, you're aiming for speed and agility, like a sports-car designer, so you want to minimize the number of structures, such as indexes, that degrade performance. In a transactional database, you use indexes to enhance performance during data retrievals, but standard practice says to index sparingly and only when necessary because of the overhead involved in inserting or updating data. In addition, because a materialized view is a copy of one or more tables, your data-storage requirements could easily double.

Materialized views provide very quick access to data. The performance increase generally compensates for the extra disk storage and processor overhead involved in keeping table data synchronized with data in the materialized view. I can't tell you definitely whether the performance enhancements a materialized view brings are worth the extra space and processing cycles to keep your data synchronized; so much depends on your situation. But I can suggest the following typical scenarios in which you might want to test whether a materialized view would work for you. Note that because indexed views add a lot of overhead during data modification, the best candidates are tables that are fairly static or that receive additions and updates during off-peak hours.

Summarizing data in an operational database. You know that aggregating data in an operational database requires a lot of system resources. The process involves more than calculating the sums, counts, or averages; SQL Server's lock manager has to balance requests for the data and might even have to delay update requests while the aggregation is in progress. In addition, each time an aggregating query executes, if the records aren't already in the data cache, SQL Server has to physically scan the data from the hard disk into memory; the more data you're aggregating, the more physical I/O you need. SQL Server can use many techniques to minimize the delay, such as releasing data pages immediately after processing the records, but the bottom line is that aggregating queries can cause slowdowns in performance, especially if your system is already carrying a heavy load.

Listing 1 shows an example of a materialized summary view of the Northwind database's Orders table, which contains data about where every order was shipped. A summary report of orders by postal code—which postal codes Northwind is delivering to—would be useful for analyzing sales trends. Figure 1 shows a partial result set from running Listing 1. From this report, Northwind Traders can easily tell which postal codes are receiving the most products.

Eliminating multitable joins. In an operational database, you often have to join many tables to get information. For instance, in Northwind, to get a report of employees by region, you have to join four tables. Being able to query the database for a list of employees in a certain region or having employee information for a specific territory at your fingertips is not only nice, it's necessary. Having a well-normalized database like Northwind, which Figure 2's entity-relationship diagram (ERD) shows, doesn't have to mean you must endure query delays while SQL Server joins the tables it needs to return the data you want. Listing 2's code is an example of materializing a view created from a four-table join. Figure 3 shows part of the report that Listing 2's code produces.

Keeping computed columns separate. One inherent rule of a well-normalized database is to keep computed data in a separate table from the user data. Computed data is data you create or generate from data collected as part of your system's day-to-day data-processing activities—think of it as second-level data processing. If you're going to calculate running sums or persist grand totals in the database, you probably don't want to store the calculations in the original data tables because of the additional overhead involved with the constant recalculations.

You could create a computed column on a table, but a computed column is a virtual column. Like regular views, the computed values aren't stored, so SQL Server must dynamically reconstitute them every time you query the table. For tables that have just a few rows, this dynamic reconstitution isn't a problem; for tables in the gigabyte range, it could result in a major performance drain. You might, however, be able to create an index on the computed column, thus persisting it in the database. (For more information, see Brian Lawton's "Buried Treasure in Your Backyard," InstantDoc ID 42264, page 16.) But the computation requirements will likely change over time, requiring you to alter the table structure. If the table has many columns and is in demand by your applications, changing the table structure could be problematic. Instead, you could create a view to hold your computed data, then materialize the view and persist it in the database. Dropping and recreating an indexed view when you need to change the computational algorithm is much easier to manage than a table that has a computed column.

You can easily set up a materialized view to summarize detail data, as Listing 3 shows. The code in Listing 3 materializes a computed-column view that contains grand totals from the Orders and Order Details tables. Rather than recalculating the grand totals for orders each time you need them, the Order_Totals materialized view contains the grand totals in a persistent state. Figure 4 shows some of this computed, summarized data.

Supporting your local Web application. Web-based applications, by nature, involve a lot of overhead. Just getting the screen drawn on the Web browser takes time. When a Web user sends a request to the database, the added delay of a poorly performing database can make the Web application unusable. If you're supporting Web applications, you need to ensure that your database is tuned for fast response. Every book on Web database development and design I've read says to do two things: Normalize your database design, and index your database tables. Because the SQL Server query optimizer uses indexes whenever possible, an indexed view is a top choice for data retrievals. Thus, you might see enhanced response time from your Web applications if you index the view that corresponds to a query from a Web browser.

Creating Materialized Views

You can create indexed views in all editions of SQL Server 2000, but only the optimizers in the Enterprise and Developer editions will use the indexes on a view. (For more information about this restriction, see the Microsoft article "PRB: Indexed Views Can Be Created on All Versions of SQL Server 2000" at http://support.microsoft.com/default.aspx?scid=kb;\[LN\];270054.) The optimizers in the other editions don't take indexed views into account when developing a query plan unless you use the WITH(NOEXPAND) query hint.

To illustrate this behavior, follow these steps in SQL Server 2000 Standard or Personal Edition:

Open Enterprise Manager
Navigate to the Northwind database
Right-click Northwind and choose View, Taskpad
In the Taskpad, select the Table Info tab and scan the display
Open Query Analyzer, use Northwind
Use any of the code listings from this article to create an indexed view
ALT-TAB back to Enterprise Manager and refresh the Taskpad screen

You should see the indexed view you created, along with storage-requirement information, listed with the user tables in the database. The view is now persistent; it's taking up space in the database and using processing cycles to keep the index data synchronized with the table data.

Because SQL Server 2000 Standard Edition's query optimizer won't use the indexed view without special instructions in the user query (i.e., WITH(NOEXPAND) in the FROM clause), nothing offsets the additional resource overhead. So if you're not running SQL Server 2000 Enterprise (or Developer) Edition and you want to use the indexed view, you have to force the query optimizer to use it.

You create an indexed view by first creating a regular view, except you add the expression WITH SCHEMABINDING, as the first three listings show. Schema binding locks in the underlying table schemas, or structures, preventing changes to the table that might orphan the indexed view. You can't change an underlying table if that would invalidate the view. If you have to change the table schema, you first have to drop the indexed view, then recreate it after the table schema changes are complete.

The next step to materializing a view is to create a unique clustered index on the view. To create the index, you need to have a unique identifier in the view. Here, you really have to know your data.

As the code in Listing 2 shows, I had to include TerritoryID rather than just TerritoryDescription in the view query because TerritoryDescription isn't a unique identifier. The code creates the unique clustered index on a concatenation of TerritoryID plus EmployeeID. By itself, EmployeeID isn't unique in the view result set. EmployeeID plus TerritoryDescription doesn't produce a unique string, either, because TerritoryDescription isn't a candidate key; the TerritoryDescription column contains duplicate values. You must include TerritoryID in the view query, then combine TerritoryID with EmployeeID to form a unique identifier to create the index on.

Managing Indexed Views

To confirm that a view is materialized, you can run the command sp_spaceused, using the indexed view name as the argument. Figure 5 shows what the result should look like. After materializing the view, you can create additional, nonclustered indexes on it. However, for transactional databases, be very conservative about the number of indexes you create because indexed views impose extra overhead during inserts, updates, and deletes. SQL Server automatically synchronizes the indexed-view content whenever an underlying table is modified, sacrificing CPU cycles and physical I/O for consistently accurate data.

Note that you can't materialize every view. To apply an index to a view, you have to follow a lot of rules and regulations, most of which you can find in Kalen Delaney's May 2000 article "Introducing Indexed Views," InstantDoc ID 8410.

You manage an indexed view much the same way you would any other view. The sp_help and sp_helptext procedures show you the columns in a view and the code that created the view, respectively. If you need to remove an indexed view from the database, you need only a DROP VIEW command. If you prefer to simply drop the clustered index that materializes the view, you can use the DROP INDEX command. Be advised that this command will also drop any nonclustered indexes you've built on the view as well because, just as in a table index, the nonclustered indexes are built on the clustered index. If you need to modify the view, you can use the ALTER VIEW command, but it will also drop all the indexes. So, you have to rebuild all indexes after modifying a view.

Faster Queries, Slower Updates

Everything you do in a database environment involves trade-offs. When you're contemplating creating indexed views to speed up your query retrieval, you have to consider the performance hit that data inserts, updates, and deletes will take. There's no substitute for careful planning and testing in a development environment (separate from your production systems). But in documenting the performance impact of indexed views, Itzik Ben-Gan in his December 2002 article, "Points of (Indexed) Views" (InstantDoc ID 26812), calculates query-performance enhancements on the order of 17:1 for date comparison and 23:1 for finding distinct values.

The environments that will benefit most from materialized views are data warehouses and decision-support systems that are primarily read-only and have few data updates. However, some transactional environments can also benefit from materialized views. If you have a large table (in the gigabyte range) that you repeatedly aggregate or join to other large tables, you'll want to test the effect a materialized view would have on those tables. But transactional databases that have heavy insert and update activity and databases that support random record retrieval on fields other than the cluster key aren't good candidates for materialized views. The overhead of maintaining the view indexes would almost certainly degrade the database's performance.

Materialized views—when used judiciously and with care—can give your transactional database a big query-performance boost. However, indexed views aren't a solution to every performance problem. Don't use them without careful testing because they might cause significant degradation during data inserts and modifications. But when you use them appropriately, the benefits in the production environment can be well worth the time you spend in testing.

Comments

Plain text