Data Tales #5: The Case of the Rogue Index

Most of my end-user clients run applications that are written by other companies ie: 3^rd party applications. Very few write most of their own applications. One of the endless challenges with this is that the application vendors never want their clients to make any changes to the databases that are used by their applications. As a customer, it can be very painful to see performance problems caused by poor or inappropriate indexing choices, and yet to be blocked out from fixing those issues by supportability concerns.

As a simple example, I was at a site where Microsoft CRM was being used. When the first window in the application opened, it queried a set of customers to display on the first screen. It used a TOP clause to limit the number of customers returned. All that worked fine, except that there was no suitable index to support that query. This means that opening the application was much slower than it needed to be. Adding an index “fixed” the problem and improved the apparent performance of starting the application, yet this type of change would be at best frowned upon, or at worst prohibited under the support agreements for the product.

While creating a new index might seem like a minor issue, it can easily be enough to break upgrades from the vendor. Unfortunately, my experience is that the vast majority of 3^rd party application vendors are not receptive to having indexing suggestions sent to them by clients.

So, do we just accept the poor performance, or do we try to find some middle ground?

Data Tales #1: The Case of the Auto-Truncating Table

Data Tales #2: The Case of the Exploding Table

Data Tales #3: The Case of the Stubborn Log File

Data Tales #4: The Case of the Phantom Duplicate

For many customers, we have created scripts that improve the indexing of application databases, and then also created scripts that put the indexing back the way it was. The customer is then instructed to run the removal script prior to applying any application upgrades from the vendor. The other problem is that the vendor might have fixed the issue in their upgrade, so reapplying the indexing improvements usually requires another round of checking and testing. This approach works OK for applications that aren’t regularly updated. The trend for new applications, however, is for a nearly constant stream of upgrades.

I’ve spent a lot of time on both sides of this situation. Most of my primary clients at present are software houses, not end-user sites. While I’ve seen the pain at the client side when applications perform poorly for no sound reason, I’ve also seen the problems from the vendor side when clients try to improve the situation themselves. This month’s tale is about one of these situations.

Deadlock Pain

Deadlocks are a fact of life for concurrent database applications. While you can take a lot of actions to minimize how often they occur, unless you serialize all of your database access through a single thread, you can run across deadlocks. It’s important that applications are designed to cope with situations where deadlocks occur, to avoid the end users seeing errors.

We often get called into help software houses with blocking issues in their applications. They usually say they need help with blocking but invariably they are experiencing deadlocks. A big part of trying to minimize the number of deadlocks that occur is to minimize the amount of time that locks are held, and to minimize the amount of data that is locked. In my experience, it’s pointless to try to focus on fixing blocking issues until the queries have been tuned. When queries run quickly, most blocking issue disappear.

Indexes and Locks

Appropriate indexing is critical to avoiding all forms of blocking issues. It’s important to make sure that SQL Server can lock the bare minimum to achieve an outcome.

Let’s look at an example. The following code creates a table that we’ll use for testing:

CREATE DATABASE IndexText;
GO

USE IndexText;
GO

CREATE SEQUENCE dbo.CustomerIDs
AS int
START WITH 1;
GO

CREATE TABLE dbo.Customers
(
    CustomerID int NOT NULL
    CONSTRAINT PK_dbo_Customers PRIMARY KEY
    CONSTRAINT DF_dbo_Customers_CustomerID 
        DEFAULT (NEXT VALUE FOR dbo.CustomerIDs),
    CustomerName nvarchar(100) NOT NULL,
    PrimaryContact nvarchar(50) NOT NULL,
    PhoneNumber nvarchar(20) NOT NULL,
    IsReseller bit NOT NULL,
    CreatedWhen datetime NOT NULL
    CONSTRAINT DF_dbo_Customers_CreatedWhen DEFAULT (SYSDATETIME()),
    LastUpdated datetime NOT NULL
    CONSTRAINT DF_dbo_Customers_LastUpdated DEFAULT (SYSDATETIME())
);
GO

INSERT dbo.Customers (CustomerName, PrimaryContact, PhoneNumber, IsReseller)
VALUES (N'Big Time Movie Productions', N'Sandra Bullock', N'02 9552-4232', 1),
   (N'Even Bigger Movies', N'Tom Hanks', N'02 9234-2343', 1),
   (N'Yet Another Production Company', N'Justin Bieber', N'03 8283-2323', 0),
   (N'A Tiny Production Company', N'John Nobody', N'07 2342-2342', 0);
GO

If we are updating the PrimaryContact column by using the primary key, then locking is minimal:

BEGIN TRAN;

UPDATE dbo.Customers 
    SET PrimaryContact = N'Freddie Mercury'
    WHERE CustomerID = 3;

I’ve noted that my session is session_id 53 and can then check (in another query window), the locks being held by the update:

SELECT resource_type, request_mode, request_type, request_status 
FROM sys.dm_tran_locks WHERE request_session_id = 53;

As with all connections to the database, we’re holding shared database lock. Because we’re updating a specific key, we’re holding intent locks at the higher levels of object, and then page, and finally we’re holding an exclusive lock at the key level. The intent locks are an indication at a higher level that locks were being sought at a lower level.

Now let’s create an index that includes the PrimaryContact column, and let’s add the PhoneNumber as an included column:

CREATE INDEX IX_dbo_Customers_PrimaryContact
ON dbo.Customers 
(
    PrimaryContact
)
INCLUDE 
(
    PhoneNumber
);
GO

Now if we perform an update that doesn’t include either of those columns, note that the locks held are unaffected:.

BEGIN TRAN;

UPDATE dbo.Customers 
    SET IsReseller = 0
    WHERE CustomerID = 3;

But if the update we are performing involves any of the columns in the index, our locking gets more complex as we’re having to deal with the index pages as well:

BEGIN TRAN;

UPDATE dbo.Customers 
    SET PrimaryContact = N'Freddie Mercury'
    WHERE CustomerID = 3;

This means that just adding an index to help with the performance of a report impacts the types of locks being held and increases the chances of blocking issues and/or deadlocks.

The Rogue Index

Software houses can find it very hard to duplicate issues that occur at customer sites but having the customers modify the databases can make this much more difficult again.

In this case, my client (the software house) had one of their customers complaining endlessly about deadlocks. I retrieved the deadlock graphs and in almost every situation, one of the objects involved was a particular index that we identified as having been added by the customer, to help with one of their reports. It was not supplied as part of the application.

What I found very odd was that every time the deadlock was occurring, a request was being made for an exclusive lock at the table level:

When a single row of the table was being updated, directly via the primary key, why would an exclusive table lock be being requested?

Given the index that the customer had created, I would have expected to see an exclusive lock at the key level but only intent exclusive locks at higher levels. In almost every case where a deadlock was occurring, SQL Server was trying to obtain an exclusive lock at the table level.

What was also interesting is that SQL Server wasn’t trying to convert a key lock to a table lock. It was trying to directly obtain a table lock. That should not have been happening for this code.

After spending some time investigating the properties of the table and the indexes, I managed to find the issue:

Note that while the customer had created what they thought was a fairly trivial index, and that we would not have expected it to have caused any real issue, they had managed to choose options to disallow both row and page locks in the properties of the index.

Locks can get escalated. By default, as soon as you’ve obtained 5000 key locks, SQL Server will decide to obtain a higher level lock (ie: escalate the lock) to minimize the overhead involved in lock management. Prior to SQL Server 2008, the escalation would be to the table level, and from SQL Server 2008 onwards, the escalation is to the partition level. That type of escalation only occurs when multiple key locks are being used so that’s not the case here.

Prior to SQL Server 7, SQL Server used page locks. The index option “allow page locks” allows us to mimic that older behaviour if it’s important for an application. Almost always, we would want to use row locks instead and so “allow row locks” should be enabled.

Disabling both row and page locks causes the system to move to the next higher lock granularity. In this case, it was the table. Holding an exclusive lock at the table level hurts concurrency and in this case, lead to a large number of deadlocks. As I mentioned earlier, the first rule in avoiding deadlocks is to hold as few locks as possible for as short a time as possible.

I have to admit to struggling to think of any really suitable use case for having both these properties set to false. If you have suggestions on how this could be useful, please leave ideas in the comments. I would think it should be rare for either of these properties to be false, and we can easily check to see if it applies to any indexes on our systems:

SELECT t.name AS TableName,
   i.name AS IndexName 
FROM sys.indexes AS i
INNER JOIN sys.tables AS t
ON i.object_id = t.object_id 
WHERE t.is_ms_shipped = 0
AND i.allow_row_locks = 0
OR i.allow_page_locks = 0;

And so it wasn’t a big surprise to see that fixing the properties of the index stopped the deadlocks occurring.