Every database management system (DBMS) developer eventually encounters a situation in which a table stores a huge amount of historical data but users retrieve only small, logically distinct portions at a time. For example, an international trading firm's database might have tables containing millions of records' worth of trades spanning a year or more, but users typically need to retrieve data for only one month at a time. To improve query performance, a DBA or a developer often splits a large table into separate tables with the same structure, each containing data restricted to a given logical separation. For instance, the trading firm's database could take the form of 12 tables, one for each month. If the yearly total was 120 million records, you might expect an average of 10 million records per table. Most users will issue queries that span a month or less, so they can choose the monthly table they want to query and sort through 10 million records instead of 120 million.
Querying smaller tables is more efficient, but it complicates the creation of front-end applications and reporting tools that need the hard-coded table names to relate to a particular query. Worse yet, if any queries span more than a month, creating programs in a robust fashion becomes almost impossible without arbitrarily applying UNION statements throughout the code.
Fortunately, SQL Server 7.0 has a feature known as horizontally partitioned views, which provides a highly effective alternative to that complexity. A horizontally partitioned view lets you divide a large table into smaller sub-tables yet provide only one view as the means of interacting with all the tables. A user or developer needs to query only one view to extract records from any of the sub-tables. In addition to the convenient interface, this type of partitioned view provides a considerable performance boost.
However, besides a brief description in SQL Server Books Online (BOL) and a small mention in the SQL Server resource kit, this useful and elegantly implemented feature is surprisingly sparsely documented. To remedy that lack, let's look at what horizontally partitioned views are, how to implement them, and how to distribute data into the correct tables automatically.
Create Horizontally Partitioned Views
To improve query performance, you could split a large table, such as the one with 120 million records, into 12 more-manageably sized tables, each containing one month's records. Because the table maintains the same structure but is partitioned along row-level boundaries—in this case, by month—it is horizontally instead of vertically partitioned. To treat all these tables as one object, you need to rejoin them with a partitioned view. The view definition in Listing 1 shows the fundamental structure of a partitioned view. Issuing queries against one object instead of 12 is one big advantage this method offers when you're creating applications and reports.
For the query plan to work most efficiently, the SQL Server engine needs a guarantee that each table contains only the data it was intended to contain. To obtain this guarantee, use check constraints. You need to create each table with the constraints in the manner that Listing 2 illustrates. The listing includes table and index creation data-definition language (DDL), but note that the check constraint is designed to ensure that the table includes only data for a particular month—January, in this example. This check constraint on the trade date guarantees that each table contains only trades for a given month. Now, when someone issues a query such as
SELECT * FROM V_TRADES WHERE TRADEDATE BETWEEN '1/10/1999 AND '1/15/1999' AND TRADETYPE = 'B'
the SQL Server engine will quickly examine all the tables that compose the view and determine through the check constraints that only the TRADE_011999 table might contain records that match the date portion of the WHERE clause. The engine will then query the TRADE_011999 table without attempting to select any records from the other tables that form part of the union.
Populate the Tables
After you've established partitioning as a method for seamlessly improving query performance, you still need a way to place the trades in the correct table. Conceivably, trade data might come in from the end of the previous month, from the current month, or from deferred trades that occurred 2 months ago. For the table population to work as seamlessly as the queries, you must create a stored procedure that can tell which table the record belongs in based on information in the record—in this case, the trade date. Because the table name the insert statements will use is variable, the most efficient way to create a stored procedure for this purpose is by using the sp_executesql system stored procedure, as Listing 3 shows. The example in Screen 1 shows a query on data from February 1999. The query plan shows that SQL Server will scan only the February table and check only the month constraint for the other tables.
A partitioned structure for large tables provides some benefits that go beyond query performance. By partitioning tables, you can easily archive old data in removable media as the months progress. For example, if you divide a 7GB table into 12 tables of 600MB each, you can store each month on a CD-ROM. Without this partitioning, an organization faces the daunting prospect of copying data from the main table to an archive table and then deleting the copied rows. In a system that runs 24 x 7, the locks such an operation generates might render the database inaccessible for an unacceptably long time. Horizontal partitioning helps you maintain high availability without costly overhead.