SQL Server 6.5 Introduces Insert Row Locking

Arguments over row- vs. page-level locking take on near religious importance in some database circles. Over the years, row locking has been one of the most requested enhancements to SQL Server from developers and has provided ammunition for the Oracle marketing machine because Oracle's product has row-level locking and Microsoft's didn't--until now. In SQL Server 6.5, Microsoft introduces Insert Row Locking (IRL), which is a first step toward a more robust lock model. This article explains what IRL can and can't do, defines the major problems it solves today, and points out some gotchas to avoid.

Why does a database need locks at all? The answer is simple: Locks provide consistent and concurrent access to data in a multiuser environment. Locks prevent simultaneous updates of the same data. With locks, you can be sure that while your airline agent is selling you that aisle seat, another agent isn't promising the same seat to someone else. Lose the locks, and airline agents can sell the same seat to more than one person. Yes, yes, I know: They do that anyway. But those mistakes are not the database's fault.

Tradition Rules
In the beginning was the page, and Bill said, "Let the page be the smallest unit of data that can be locked." (Actually, it was the engineers at Sybase who said that, but that's another story.) The page, 2KB in SQL Server, was the lowest level of lock granularity in SQL Server. But, one row, such as a customer record, is always less than 2KB (because a row must fit on one page), so a page usually contains many rows. With page locking, Joe cannot increase the credit limit on the "Smith" account if Mary is updating the "Smithe" account at the same time and these records are stored on the same page. Row locks solve the problem by providing a finer level of granularity. With row locking, you can edit Smith and Smithe at the same time because individual rows are locked rather than the whole page.

In the past, a transaction obtained an exclusive page lock when it needed to insert a row, which made the page off-limits to other transactions. The transaction held this exclusive lock for the life of the transaction (everything between a BEGIN TRAN and COMMIT or ROLLBACK TRAN) and blocked all other attempts to access the page because competing transactions cannot hold exclusive locks at the same time.

With IRL, SQL Server provides logical row locking for most insert operations. I call IRL logical row locking for those people who argue that it's not true row-level locking because SQL Server implements IRL with two new page locks called insert_page (IX) and link_page (LN). IRL lets different transactions obtain insert_page locks at the same time, meaning multiple transactions can insert rows concurrently on the same page without blocking each other.

Insert_page locks are compatible with each other in that many transactions can hold locks on the same page simultaneously. You obtain a link_page when one transaction detects that the current page is full and identifies the need to allocate and link a new page to the table's page chain. During a link, SQL Server temporarily upgrades the insert_page lock to a link_page, which blocks subsequent insert_page requests until the transaction owning the link_page completes. IRL works only with inserts because insert_page and link_page are incompatible with all other lock types. (In case you're wondering, IRL does not work with inserts that occur during a deferred update. Check SQL Server Books Online--BOL--documentation if you're not familiar with when an update is actually performed as a delete followed by an insert.) Exclusive and shared locks are necessary for managing selects, updates, and deletes. Figure 1 summarizes the compatibility of lock types in SQL Server 6.5.

IRL Tackles the Hot Page Problem
One of the most common concurrency issues SQL Server designers face is the hot page insert problem. A hot page occurs when multiple users try to insert data on the same page at the same time, creating a bottleneck with the potential to cripple OnLine Transaction Processing (OLTP) applications that require high concurrency levels. Here's how a hot page occurs and how IRL solves the problem.

A table is a linked list of 2KB pages with no explicit sort order, unless a clustered index exists. The last page in the chain is like a bucket, and all new rows go in it. SQL Server adds new pages to the end of the chain when the current bucket fills up. When two people attempt to insert data at the same time when no index exists on a table, they need access to the last page. But only one of them can grab the lock, so the other one waits. The last page becomes a serious bottleneck as more users are added and contention for this page becomes more severe.

The first step toward a more robust lock model

You can add a clustered index to force a sort order on the underlying table. SQL Server inserts data in its sorted location rather than on the last page in the data chain, eliminating the problem of everyone vying for access to one page. People have mistakenly thought this tactic would solve the problem entirely. Unfortunately this solution doesn't always do the trick. What happens if the clustered index happens to be on a primary key using the Identity property to assign key values? (primary keys create a unique index on the key, and you can use Identity to auto-generate ascending key values similar to the counter data type in Microsoft Access.) The clustered index inserts data in sorted order. But the counter behavior of Identity means that sorted order is always on the last page, reintroducing the problem we were trying to solve in the first place.

So, you put a clustered index on every table and never use it with an Identity column. This solution eliminates contention in the data pages, but what happens when you add a non-clustered (NC) index on a column defined with an Identity property? This method introduces a hot page in the leaf level of the NC index because SQL Server inserts a row on the last page in the leaf level of the NC index every time you add a new data row. No matter how you slice it, you introduce a hot page whenever you use an index with an Identity column. Sometimes contention occurs on the data pages, and other times it occurs on index pages. Either way produces a limitation I'd rather do without.

IRL eliminates this particular hot page problem by implementing row-level concurrency for insert operations. You don't get a hot spot because inserts don't block each other in either the data or index pages. Microsoft benchmarks show that IRL offers massive performance improvements of up to 40 percent for applications that are highly concurrent, but it provides little or no benefit to low concurrency applications.

Use IRL with Care
IRL solves important problems, but you must keep a few gotchas in mind. Don't enable IRL in a production application without extensive testing, or the gotchas will cause more problems than IRL solves.

The first point to keep in mind is that IRL works only for inserts. You still need to deal with all the old concurrency issues that occur with updates, deletes, and selects. Second, if a clustered index exists, you must declare it unique, or IRL will not be available for the table. I'm not sure why this limitation exists, but it does.

Third, SQL Server 6.5 disables IRL by default, and you can turn it on for specific tables with sp_tableoption. The procedure calls dbcc rowlock, which toggles the sysobjects.userstat 0x01 bit, and takes effect immediately. You do not need to stop and start SQL Server. For example, you can enter

sp_tableoption @TableNamePattern
\[, '@OptionName'\]
\[, '@OptionValue'\]

@TableNamePattern selects the qualified or non-qualified name of a user table (with or without database and owner specifications) and accepts any string pattern appropriate for use with the like operator, so you can affect many tables at once. The following command enables IRL for all tables in the current database.

EXECUTE sp_tableoption '%.%', 'insert row lock', 'true'

In addition, IRL can introduce deadlock situations because it lets some transactions that used to occur sequentially now occur concurrently. A deadlock occurs when two users have locks on separate objects and each user wants a lock on the other's object. Each user waits for the other to release their lock, neither process can finish, and SQL Server kills one of the processes. "Preventing Deadlocks and Lock Starvation" and "Insert Row Locking" in SQL Server BOL tell you how to avoid deadlocks in the first place.

Also, IRL can double the size of your transaction log during rollback operations. Microsoft did some clever programming to provide IRL without having to make significant changes to lock and log managers that were originally written to support page-based data access. SQL Server writes extra information to the log during a rollback (called compensatory records by Microsoft) to avoid the data inconsistencies that can occur when using page-based technology to handle row-level data access. My informal testing shows that rollbacks create approximately twice the number of log records when IRL is on. Fortunately, COMMITS don't require extra space and tend to occur more frequently than rollbacks.

IRL Is Only the Beginning
IRL isn't a magic elixir for your row-locking blues, but it solves some real-world problems such as insert hot pages, and is the first step toward the flexible dynamic locking model that Microsoft hopes to implement in SQL Server 7.0. Dynamic locking mixes the best of page and row locking by starting out with coarse locks (i.e., a page) and de-escalating to finer granularity locks (i.e., a row) when contention is detected. Remember, IRL is better than no row locks at all until dynamic locking is more than a white paper. Check out http://www.microsoft.com/sql/locking.htm for more information about Microsoft's Dynamic Locking Initiative.

Comments

Plain text