SQL Server table partitioning is most often used to implement a sliding window. For that purpose, the table is most frequently partitioned on a date column. This allows for efficient access to the data in a given date range. It also lets you drop aged data from the table by just moving the partition as a whole to another table using the SWITCH option. A lot has been written about this scenario, so we won’t go into additional details here. Instead, we’ll explain how to use the partitioning feature when a date column isn’t the right choice.
If you have a large OLTP database, most of the queries probably access only a very small percentage of the data. To save on costs and increase query performance, it often makes sense to move the less frequently accessed data out of the way and onto cheaper storage with less frequent backups.
One way to achieve this is by creating two versions of each table, one for the active data and one for the less used (archived) data. However, this means that all queries that need to access the archived data must now select a union of the active and archive tables. In existing applications, this could mean a lot of code changes, which isn’t always an option.
There are several ways to address this problem. Partitioning the tables is one option. In this article, we’ll show you how to set up partitioning to separate data into active and inactive data sets, and we’ll explain some of the pitfalls of this method.
First, we must determine which column to partition on. Suppose we want to partition our Customer table into active and inactive customers. If there’s a customer in the database that hasn’t ordered anything in a long time, that customer might be “eligible” for archiving. The last order date is a natural choice on which to base this decision. However, depending on the database design, that field might not exist in the Customer table. In addition, there might be other factors to prevent this customer from being archived. For example, the customer might have an outstanding balance due.
Therefore, in this case, a date value isn’t really an appropriate choice for archiving. It would be nice if a customer record and all its child records could be marked as ready to be archived. The easiest way to achieve this is to add an ArchiveIndicator column to all tables, which should be included in the archiving process. This indicator would be set according to a set of business rules in a weekly or monthly running job. After that indicator exists, it can be used as the partition column.
Using an ArchiveIndicator as a Partitioning Column
Let's look at an example. First, we’ll create a simple partition function and partition scheme. Typically, you’d want to define the partition scheme to put the two partitions on different file groups, but for testing and proof of concept, we can create them both on the same file group. Listing 1 (below) contains the code to create both partitions on the DEFAULT file group (whatever that happens to be in your database). We’ll assume you have a database called testdb; if you don’t, you can easily create one by running the command
CREATE DATABASE testdb;
After running the code in Listing 1 (below), we need to create a Customer table to which we can add the ArchiveIndicator column. We’ll run the code in Listing 2 (below) to try to create the table on the psArchive partition scheme, which hopefully you’ve already created. However, our table creation attempt results in the error message that Web Figure 1 shows.
Figure 1: Error message from trying to create Customer table
If we attempt to resolve the error by adding the ArchiveIndicator to the primary key, we create another problem: Now the CustomerId is unique per partition—which means that after a customer is archived, a new customer with the same CustomerId could be created in the active partition. This probably isn’t the intended behavior.
The error message makes the assumption that the index will be partitioned using the same partitioning scheme as the table, and it says that the partitioning column needs to be part of all unique indexes. However, that’s true only for a clustered unique index, which must be partitioned the same way as the table. For a nonclustered index, you can tell SQL Server to not partition the index by placing it directly on a specific file group, as the code in Listing 3 (below) does.
This approach allows for a real primary key, but it prevents the index from being partitioned. Thus, even the old data for this index has to stay on the expensive drive. If the index is on a single integer column, the added storage requirement is probably bearable. The impact on performance should be minimal if the other indexes are chosen correctly. However, if the key is a natural key, on the name column, for example, moving it to a single file group might negate most of the performance benefits that partitioning provided.
Another drawback of not partitioning the primary key index is that it prevents partition switching (i.e., moving partitions out of the table and into another table). Partition switching is required in the sliding window scenario, but in the archiving case in which the purpose is to separate old and new data while keeping all of it available, it’s rarely useful.
You need to carefully weigh the drawbacks and benefits of enforcing uniqueness versus having real partitioning not only for the primary key but also for every other unique index. For example, if the column being indexed is an identity column and the application doesn’t use the identity insert feature, it’s probably safe to add the partitioning column as part of the primary key because an undisturbed identity column won’t produce duplicate values. However, if the values for the primary key are generated by the application, real uniqueness of those values should be enforced by having a unique index on those values alone.
These considerations aren’t unique to the partitioning solution. All other ways of splitting a table into an active and archive set of rows must address the same questions. Using partitioning, however, provides one of the simplest methods of implementation, regardless of how you choose to handle your unique keys.
Selecting from a Table with an ArchiveIndicator as the Partition Column
After creating a partitioned table, the next thing to look at is how to access the data so that the archiving benefits can be realized.
Internally, SQL Server stores each partition as a table in its own right; that is, as a separate structure. In fact, every table has at least one partition. You can verify this by running the query in Listing 4 (below), which returns the number of partitions for each table. The count is always greater than or equal to 1.
Storing each partition as a separate structure means that any time SQL Server needs to search for data, whether it uses a seek or scan operation, it has to perform the search in every partition. The exception is if the target partition can be determined from the query. To show this, let’s use the code in Listing 5 (below) to create another table and fill it with some test data.
The table PartitionTest is constructed in such a way that very few rows fit into a page on any level of the clustered index. The script inserts 40 rows into each partition. Because of the size of the index keys, the clustered index will end up four levels deep, as we can see by running the code in Listing 6 (below). Table 1 shows the results of running this code; that is, the number of pages in each partition and the depth of the indexes.
Table 1: Output from sys.dm_db_physical_stats
The script in Listing 7 (below) uses a cursor to step through the table and run the following three queries for each row in the table:
SELECT pageFill FROM dbo.PartitionTest WHERE id = @id AND archiveIndicator = 0;
SELECT pageFill FROM dbo.PartitionTest WHERE id = @id AND archiveIndicator = 1;
SELECT pageFill FROM dbo.PartitionTest WHERE id = @id;
After the three queries run for each row, the script records the number of logical reads that each query took to run. It stores the result of these measurements in a temporary table called #readCount. This table contains the number of reads required to search for archived and nonarchived rows, with or without specifying the archive value in the query. Table 2 shows the first few rows of the #readCount table, after the script in Listing 7 (below) runs.
The value for logical reads when the ArchiveIndicator isn’t specified is always the sum of the reads required for both queries with the ArchiveIndicator given. This means that SQL Server goes into both partitions to figure out whether a row that qualifies exists in the partition. To get the performance benefits we were hoping for, each query has to specify whether it’s interested in active data or in archived data.
In the beginning of the article, we said that implementing archiving by moving old data into a separate table requires all queries to change, which might not be possible in an existing system. Now we’re stating that this is also true for partitioning. So, what’s the benefit of partitioning?
The biggest difference is that with partitioning, all existing queries still work. You only have to change the queries to realize a performance benefit, which can be done on a query-by-query basis. However, a performance penalty will occur for unchanged queries. So you might want to prioritize the queries to change based on the effect the change has on the query.
To ensure that new rows get created in the active partition, the ArchiveIndicator must have a value of 0 for all new rows. The easiest way to implement this is by specifying a default constraint on the ArchiveIndicator column, as in the example.
In the beginning of the article, we discussed ways to add a real primary key to a partitioned table. If we choose to implement the primary key using a nonpartitioned index, we need to consider how such a nonpartitioned index will affect these findings. The script in Listing 8 drops and recreates the PartitionTest table, this time with a nonpartitioned primary key on the id column. Table 3 shows the first few rows of the #readCount table after rerunning the code in Listing 7 with this nonpartitioned primary key in place.
Table 3: Read Page Counts when Nonpartitioned Primary Key Exists
No longer is the third column (for queries with no ArchiveIndicator specified) always equal to the sum of the previous two columns. For active records (indicated by ArchiveIndicator having a value of 0, shown in the first column of the output), the number of pages read for the query not specifying the ArchiveIndicator is the same as the number of pages read when the query has the ArchiveIndicator value specified. This value is about half of the value in the last column of output for archived records, with an ArchiveIndicator value of 1. To explain this, we need to look at the execution plan, which is the same for all three queries. Web Figure 2 shows the execution plan.
SQL Server performs a seek against the clustered index and doesn’t use the newly created unique nonclustered index. However, because of the presence of the unique index on the single id column, SQL Server now knows that there can be at most one row for any given value of the id column. Therefore, once a row is found, there’s no need to continue the search in the other partition. In our case, it seemed that the first partition was always accessed first and the row found there, but there’s no guarantee as to the order in which the partitions will be searched.
The previous scenarios looked at queries executing a seek operation against the clustered index. The same access patterns can be found when performing a seek against a partitioned nonclustered index. To show this, let’s change the example table. Listing 9 (below) contains the code to recreate the PartitionTest table, this time with a unique partitioned nonclustered index.
The id column is now the third column in the clustered index, so the clustered index isn’t helpful in finding a row with a given id value anymore. The new nonclustered index PartitionTestI1 is two levels deep and is on the same columns that the clustered index was on before. Table 4 shows the first few rows for the number of reads required when the clustered index isn’t useful for searching. As expected, if the ArchiveIndicator isn’t specified, both partitions of the index have to be consulted to determine where the row lives.
Table 4: Read Page Counts when the Clustered Index Isn’t Useful for Searching
Another noticeable change compared with the clustered seek operation is that the overall number of pages read seems to be lower. Let’s look at that in a little more detail. Table 5 contains the total number of logical reads that the query without ArchiveIndicator took while using a cursor to step through the table. The measurements were taken for both the clustered seek and the nonclustered seek case, as well as for the partitioned and the nonpartitioned case.
Table 5: Number of Logical Reads
Although both partitioned cases are more expensive than the nonpartitioned ones, the partitioned nonclustered seek is significantly closer to the nonpartitioned case than the clustered case—which makes sense because the nonclustered index is a lot less deep. This means that although finding a row is a little more expensive, not finding a row is significantly cheaper than in the clustered case.
To Partition or Not to Partition
If you have a table with a lot of rows, and a large percentage of the rows are rarely accessed, it might make sense to move those rows to a cheaper storage tier and out of the way of the most frequent queries. Partitioning provides a simple way to achieve just that. Partitioning will greatly improve the performance of queries that must now search only through the small active part of the table. However, the performance of queries that must search through the entire table will suffer. If most of your queries fall into the first category, overall system performance will improve. Depending on the price structure of your hosting provider, you can also save a lot of money.
If you choose to implement partitioning for this reason, you should revisit all your indexes and decide whether it makes sense to also partition them, or if they should remain nonpartitioned. You should also try to make as many queries as possible archive-aware, which you could achieve, for example, by simply having the user select either recent results and fast response or complete results and very slow response.
Listing 1: Code to Create Two Partitions on the DEFAULT File Group
USE testdb; GO CREATE PARTITION FUNCTION pfArchive(TINYINT) AS RANGE LEFT FOR VALUES(0); GO CREATE PARTITION SCHEME psArchive AS PARTITION pfArchive ALL TO (\\[Default\\]); GO
Listing 2: Code to Create a Customer Table on the psArchive Partition Scheme
USE testdb; GO CREATE TABLE dbo.Customer( CustomerId INT CONSTRAINT pkCustomer PRIMARY KEY, Name VARCHAR(60), ArchiveIndicator TINYINT ) ON psArchive(ArchiveIndicator);
Listing 3: Code to Place Index Directly on a Specific File Group
CREATE TABLE dbo.Customer( CustomerId INT CONSTRAINT pkCustomer PRIMARY KEY NONCLUSTERED ON \\[PRIMARY\\], Name VARCHAR(60), ArchiveIndicator TINYINT ) ON psArchive(ArchiveIndicator);
Listing 4: Query to Return Number of Partitions per Table
SELECT (SELECT COUNT(1) FROM sys.partitions AS p WHERE t.object_id = p.object_id ) AS PartitionCount, OBJECT_SCHEMA_NAME(t.object_id)+'.'+t.name AS TableName FROM sys.tables AS t ORDER BY PartitionCount DESC;
Listing 5: Code to Create a Table Filled with Test Data
CREATE TABLE dbo.PartitionTest( archiveIndicator TINYINT NOT NULL DEFAULT 0, id INT IDENTITY(1,1), IndexFill CHAR(888) NOT NULL DEFAULT '', PageFill CHAR(7100) NOT NULL DEFAULT '', CONSTRAINT PartitionTestCI UNIQUE CLUSTERED(id,archiveIndicator,IndexFill), ) ON psArchive(archiveIndicator); GO SET NOCOUNT ON; GO INSERT INTO dbo.PartitionTest(archiveIndicator) VALUES (0),(1); GO 40
Listing 6: Code to Show Index Size and Deepness Information for the PartitionTest Table
SELECT OBJECT_SCHEMA_NAME(object_id)+ '.'+ OBJECT_NAME(object_id) AS TableName, index_type_desc, alloc_unit_type_desc, partition_number, index_depth, index_level, page_count, record_count FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.PartitionTest'), NULL, NULL, 'DETAILED');
Listing 7: Code to Measure the Cost of Data Access in the PartitionTest Table
IF OBJECT_ID('tempdb..#readCount') IS NOT NULL DROP TABLE #readCount; CREATE TABLE #readCount( archiveIndicator INT NOT NULL, id INT PRIMARY KEY, logical_reads_0 INT, logical_reads_1 INT, logical_reads_X INT ); INSERT INTO #readCount (archiveIndicator, id) SELECT archiveIndicator,id FROM dbo.PartitionTest; GO DECLARE cur CURSOR LOCAL FORWARD_ONLY FOR SELECT id FROM #readCount ORDER BY id FOR UPDATE; DECLARE @id INT; DECLARE @dummy VARCHAR(MAX); DECLARE @logical_reads INT; DECLARE @rowCount INT; OPEN cur; FETCH NEXT FROM cur INTO @id; WHILE(@@FETCH_STATUS = 0) BEGIN DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; DBCC FREESYSTEMCACHE('ALL'); DBCC FREESESSIONCACHE; SELECT @logical_reads=logical_reads FROM sys.dm_exec_requests WHERE session_id = @@SPID; SELECT @dummy = pageFill FROM dbo.PartitionTest WHERE id = @id AND archiveIndicator = 0; SELECT @logical_reads=l[email protected]_reads FROM sys.dm_exec_requests WHERE session_id = @@SPID; UPDATE #readCount SET logical_reads_0 = @logical_reads WHERE CURRENT OF cur; DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; DBCC FREESYSTEMCACHE('ALL'); DBCC FREESESSIONCACHE; SELECT @logical_reads=logical_reads FROM sys.dm_exec_requests WHERE session_id = @@SPID; SELECT @dummy = pageFill FROM dbo.PartitionTest WHERE id = @id AND archiveIndicator = 1; SELECT @[email protected]_reads FROM sys.dm_exec_requests WHERE session_id = @@SPID; UPDATE #readCount SET logical_reads_1 = @logical_reads WHERE CURRENT OF cur; DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; DBCC FREESYSTEMCACHE('ALL'); DBCC FREESESSIONCACHE; SELECT @logical_reads=logical_reads FROM sys.dm_exec_requests WHERE session_id = @@SPID; SELECT @dummy = pageFill FROM dbo.PartitionTest WHERE id = @id; SELECT @[email protected]_reads FROM sys.dm_exec_requests WHERE session_id = @@SPID; UPDATE #readCount SET logical_reads_X = @logical_reads WHERE CURRENT OF cur; FETCH NEXT FROM cur INTO @id; END CLOSE cur; DEALLOCATE cur; GO SELECT * FROM #readCount; GO
Listing 8: Code to Recreate the PartitionTest Table with a Nonpartitioned Primary Key
DROP TABLE dbo.PartitionTest; GO CREATE TABLE dbo.PartitionTest( archiveIndicator TINYINT NOT NULL DEFAULT 0, id INT IDENTITY(1,1), IndexFill CHAR(888) NOT NULL DEFAULT '', PageFill CHAR(7100) NOT NULL DEFAULT '', CONSTRAINT PartitionTestCI UNIQUE CLUSTERED(id,archiveIndicator,IndexFill), CONSTRAINT PartitionTestPK PRIMARY KEY NONCLUSTERED(id) ON \\[PRIMARY\\], ) ON psArchive(archiveIndicator); GO INSERT INTO dbo.PartitionTest(archiveIndicator) VALUES (0),(1); GO 40
Listing 9: Code to Recreate the PartitionTest Table with a Partitioned Unique Nonclustered Index
DROP TABLE dbo.PartitionTest; GO CREATE TABLE dbo.PartitionTest( archiveIndicator TINYINT NOT NULL DEFAULT 0, id INT IDENTITY(1,1), IndexFill CHAR(888) NOT NULL DEFAULT '', PageFill CHAR(7100) NOT NULL DEFAULT '', CONSTRAINT PartitionTestCI UNIQUE CLUSTERED(archiveIndicator,IndexFill,id), CONSTRAINT PartitionTestI1 UNIQUE NONCLUSTERED(id,archiveIndicator,IndexFill), ) ON psArchive(archiveIndicator); GO INSERT INTO dbo.PartitionTest(archiveIndicator) VALUES (0),(1); GO 40