How to prevent table scans when searching LOB data?

Question: Our schema involves a varchar (max) column which contains a set of data fields that come from another system. We can’t change that part of the system but we’re having problems with blocking when running update queries that involve searching that column. Is there anything we can do?

Answer: This is a question that came up today from one of the students in our current class.

The scenario is a fairly common one where a LOB column is being used to store a structured set of data (sometimes called a ‘property bag’) that is infeasible (for whatever reason) to break up into separate columns to store in a relational table. If a SELECT has the LOB column in the WHERE clause then the query plan will involve a table scan – as you cannot create a nonclustered index over a LOB column.

If a transaction involves UPDATEs and a SELECT that searches the LOB column, then there is the possibility of blocking occurring. Update operations require exclusive locks on the resource being updated (e.g. table rows) and these locks must be held until the transaction commits. If an UPDATE occurs, taking an exclusive lock, in the first part of a transaction and then a SELECT occurs that causes a table scan, that table scan might take a long time to execute. This makes the transaction take a long time overall and so the exclusive lock from the UPDATE operation is held for a long time. This may cause another transaction to block, waiting for the UPDATE’s exclusive lock to be released.

So how can you get rid of the table scan without completely shredding the LOB value into a table itself? Simple – you need to create a nonclustered index that allows the SELECT to be an index seek rather than a table scan.

But what to pick as the index key? You need to create a computed column that pulls a substring out of the LOB value, where the substring is one of the fields in your structured data inside the LOB value. Then you can create a nonclustered index over that computed column. This is a similar concept to creating a secondary XML index on an XML column – essentially pulling out one field as the index key.

This isn’t an ideal situation but it does solve the huge performance problem and I’ve seen this done several times successfully when there are no alternatives.

Comments

Plain text