Native Partitioned Tables and Indexes

Downloads
45153.zip

Editor's note: The features described in this article aren't final, so they might not be in the final product release in the same form.

For DBAs, one of the most important maintenance-related enhancements in SQL Server 2005 is the introduction of native partitioned tables and indexes. SQL Server 2005 enhances local partitioning, which is used for maintenance and performance reasons. Partitioning isn't new to SQL Server and the motivation for partitioning is still the same. Maintenance-related reasons for local partitioning include having a smaller unit size for index rebuilds and defrags compared to one large non-partitioned table, smaller units to backup, faster data loading, instant purging of historic data, and simplified archiving. Partitioning can give you better performance by eliminating partitions that the optimizer uses, which means that when your table is partitioned, the optimizer can figure out which partitions not to access at all based on the query filter you provide. SQL Server 2005 can also utilize parallel processing of partitions, and load data to a partition faster than loading to a big non-partitioned table. Also, with partitioned tables, the unit of lock escalation is reduced-partition instead of the whole table. SQL Server limits the amount of resources a single transaction can consume, so if a transaction issues a query and attempts to acquire too many individual row locks, SQL Server escalates the individual row locks to a full table lock when the table isn't partitioned. In such a case, other processes can't acquire incompatible locks on any row in that table. On the other hand, if the table is partitioned, the locks escalate to the partition unit instead of the whole table, so other processes can still acquire locks on rows from other partitions.

Local partitioning in SQL Server 2000 has several limitations. It has no support for native partitioned tables or native commands for partitioning activities. Partitioning is a result of combining tables and views, so you have to maintain many tables and explicitly create, drop, and alter the tables and views to perform partitioning activities. SQL Server 2005 introduces support for native partitioned tables and indexes, so you maintain only one table and one copy of each index. You also use native commands for partitioning activities such as creating partitioned tables and indexes, or splitting, merging, and switching partitions.

In SQL Server 2000, you can have differences between the tables the partitioned view references (e.g., nullability, indexes), which means that when the optimizer generates a plan for a query against the view, it has to consider each target table involved in the query's filter separately. For example, if you have a query against a partitioned view that queries 1 year's worth of data, and each partition contains 1 week's worth of data, the plan must include all 50+ partitions. The optimizer must consider each partition separately and duplicate the query's filter against the view for all partitions in the plan. This duplication can lead to lengthy optimization times and large plans. In SQL Server 2005, execution-plan creation is more efficient because all partitions are required to be exactly the same. The optimizer doesn't need to consider each partition separately or duplicate anything.

You can't create indexes on a view and you can't partition indexed views in SQL Server 2000, but you can in SQL Server 2005. Other limitations in SQL Server 2000 affect partitioning if you want a partitioned view to be updateable because the partitioning column must be part of the primary key and the IDENTITY property can't be used in the base tables. SQL Server 2005 supports the IDENTITY property, and the partitioning column doesn't have to be part of the primary key.

SQL Server 2005's enhancements to local partitioning address most of the limitations in SQL Server 2000. However, there are limitations on partitioning in SQL Server 2005-for example, partitions can't span servers or databases, and it only supports up to 1000 partitions in one table. In SQL Server 2005 you still use distributed partitioning to scale out your environment in the same way you did in earlier versions. See "Distributed Partitioned Views," Instant Doc 9086 for more information about distributed partitioning. But you can use SQL Server 2005's new partitioning capabilities to partition your tables and indexes locally, combined with distributed partitioned views, for scale-out purposes.

Creating Partitioned Tables in SQL Server 2005

There are three steps involved in creating partitioned tables and indexes. First, you create the partitioning function. Unlike most functions, the partitioning function doesn't contain any code. The function contains boundaries that will determine which source row corresponds to which target partition. Eventually, you'll map a column from the table or index to the function, and the column's value will determine the source row's target partition. Second, you create a partition scheme. A partition scheme is a list of filegroup names mapped to a partition function. You use the partition scheme to determine the target physical location (filegroup) for each range defined in the partitioning function. Third, you create the tables and indexes on a partition scheme, specifying the partitioning column name whose values will determine the target partition for each source row. Run the following code in SQL Server 2005 to create the testdb database that you can use to run all the scripts in this article.

USE master;
IF DB_ID('testdb') IS NOT NULL
  DROP DATABASE testdb;
CREATE DATABASE testdb;
GO
USE testdb;

I'll create the Orders partitioned table and load source data from the Sales.SalesOrderHeader table in the AdventureWorks database to demonstrate partitioning. Let's start with the first step-creating the partitioning function.

Step 1: Creating the Partitioning Function

The CREATE PARTITION FUNCTION command has five elements: a name, the data type of the column that you'll use later as the partitioning column, the partitioning type (RANGE is currently the only supported type), boundary direction (LEFT or RIGHT), and a list of range boundary values. Run the following command to create the partition function, PF_RangeByYear(), for our Orders scenario, which we'll partition by year.

CREATE PARTITION FUNCTION PF_RangeByYear(DATETIME)
AS RANGE RIGHT 
FOR VALUES ('20020101', '20030101', '20040101');

The partitioning column in our Orders scenario is orderdate, so the datatype specified is DATETIME. The PF_RangeByYear() function creates four partitions:

Partition 1: source_date = '200210101' AND source_date = '200310101' AND source_date = '20040101'

The boundary direction specifies whether the boundary value itself will be part of the left or right partition. This scenario uses RIGHT, which means that the first boundary value is the first supported value in the second partition-all values in the first partition are less than that boundary value. LEFT would mean that all values in the first partition are less than or equal to the first boundary value specified. The ability to specify LEFT or RIGHT is helpful when you have problems precisely specifying the first or last possible value of a partition because of rounding processes. For example, if you use the LEFT option instead of RIGHT to specify that the first partition contains data earlier than 2002, you have to specify the latest possible value within the year 2001 as the first boundary value. Because the accuracy of a datetime data type in SQL Server is 3.333 milliseconds rounded to the nearest tick, the value 20011231 23:59:59.999 would be rounded to 20020101 00:00:00.000, which you don't want to include in the first partition. So, either use the RIGHT option as I did with the dates only, or if you insist on using the LEFT option, you should specify the highest supported value for 2001, which is 20011231 23:59:59.997. Figure 1 illustrates the range-to-partition mappings.

Step 2: Creating the Partitioning Scheme

First, make sure you have a folder called c:\temp, then run Listing 1 to create four filegroups in the testdb database, each with a single data file. Next, run the following code, which creates a partitioning scheme called PS_Orders, mapping the PF_RangeByYear() function to the four filegroups that Listing 1 creates (FG_Orders2001, FG_Orders2002, FG_Orders2003, FG_Orders2004).

CREATE PARTITION SCHEME PS_Orders
AS PARTITION PF_RangeByYear
TO (FG_Orders2001, FG_Orders2002, FG_Orders2003, FG_Orders2004);

Note that you can map multiple partitions to the same filegroup if you don't want to place different partitions on different physical drives. If you specify more filegroups than partitions, SQL Server marks the first extra filegroup as NEXT USED. (I'll explain where the NEXT USED option is relevant in a future article.) Multiple partition schemes can use the same partition function, but not the other way around. Figure 2 shows the mappings of the partitioning ranges from the function PF_RangeByYear() to the different filegroups in the partition scheme PS_Orders. Now that you've created the partitioning function and partitioning scheme, you're ready to create the partitioned tables and indexes.

Step 3: Creating the Partitioned Tables

To create a partitioned table on a partitioning scheme, you simply issue a CREATE TABLE command and use the ON clause to specify the partitioning scheme name and partitioning column name. Run the following code to create the partitioned Orders table on the PS_Orders partitioning scheme, specifying orderdate as the partitioning column.

CREATE TABLE Orders
(
  orderid    int      NOT NULL,
  customerid int      NOT NULL,
  orderdate  datetime NOT NULL,
  contactid  int      NOT NULL,
  /* ...  other columns ...  */
) ON PS_Orders(orderdate);

To populate the Orders table with 3 years' worth of orders from the Sales.SalesOrderHeader table in the AdventureWorks database, run the following INSERT statement:

INSERT INTO Orders(orderid, customerid, orderdate, contactid)
  SELECT SalesOrderID, CustomerID, OrderDate, ContactID
  FROM AdventureWorks.Sales.SalesOrderHeader
  WHERE OrderDate >= '20010101'
    AND OrderDate 

Now you can create indexes on the Orders table and start querying the new data.

Native Partitioning


SQL Server 2005's native partitioning capabilities mean that to support partitioning, you don't have to deal with multiple individual tables and a view, mixing two object types that aren't related to partitioning. Instead, you have built-in partitioning support, which means improved manageability and better performance. Now you know the three steps required to create partitioned tables. Next month, I'll discuss creating indexes, index alignment, and querying metadata information about partitions.

Comments

Plain text