A Tip Regarding Forcing Join Algorithms

I’m sometimes amazed when I learn something new about T-SQL that
apparently existed in the product for ages and could be handy had I known
about it earlier. This happened to me recently when I was looking at a query
written by Umachandar Jayachandran who is with the Microsoft SQL
Server Performance Engineering Team.

When you tune join queries and suspect that the optimizer chose a
suboptimal join algorithm, to verify your suspicions you can force a specific
join algorithm by either using a join hint (e.g., INNER HASH JOIN) or a
query hint (e.g., OPTION(HASH JOIN)). The general recommendation is
to try and avoid using such hints in production code because the hint makes
that part of optimization static. Namely, data distribution changes won’t
impact the optimizer’s choices in respect to the part that you forced with the
hint. Still, hints are very handy when you suspect that the optimizer made a
suboptimal choice in terms of some query optimization aspect, and you want
to verify your suspicions. Occasionally it’s hard to resolve some production
query performance problem without a hint, and you end up using it as the
last resort.

Suppose that you identified a suboptimal choice of a merge join algorithm
that the optimizer made for a given query. This can be due to very
inaccurate selectivity estimates or some other reason. If it’s not an issue of
out-of-date statistics or some other aspect that you can resolve, you find
yourself in a spot where you’re last resort is to force a join algorithm. But
here’s the thing—you don’t want to force a specific join algorithm, rather
you want to prevent the optimizer from using merge. You still want to allow
the optimizer to choose between nested loops and hash based on cost
estimates. Now here’s the nifty little trick I learned from Umachandar;
apparently in the OPTION clause you are not restricted to specifying only
one join algorithm, rather you can list more than one. When you list more
than one, you restrict the optimizer to using a join algorithm from the
specified list. So if you want to prevent it from using a merge join, but still
allow it to consider both nested loops and hash, specify:

As an example, the default plan I got for the following query in SQL Server
2005 had a merge join:

USE Northwind;
FROM dbo.Customers AS C
  JOIN Orders AS O
    ON O.CustomerID = C.CustomerID;

  |--Merge Join(...)
       |--Clustered Index Scan(OBJECT:(\[Customers\].\[PK_Customers\]...)
       |--Sort(ORDER BY:(\[O\].\[CustomerID\] ASC))
            |--Clustered Index Scan(OBJECT:(\[Orders\].\[PK_Orders\]...)

To prevent the optimizer from using a merge join algorithm, add the
OPTION clause listing LOOP and HASH as the algorithms to consider:

FROM dbo.Customers AS C
  JOIN Orders AS O
    ON O.CustomerID = C.CustomerID

This time the plan I got shows that the optimizer chose to use a hash join:

  |--Hash Match(...)
       |--Clustered Index Scan(OBJECT:(\[Customers\].\[PK_Customers\]...)
       |--Clustered Index Scan(OBJECT:(\[Orders\].\[PK_Orders\]...)

Note though that if you have more than one join in the query the OPTION
clause will impact all joins. SQL Server does support forcing a certain join
algorithm for a specific join by using a join hint (e.g., T1 INNER HASH
JOIN T2), but I haven’t yet figured out how (if at all possible) to restrict the
optimizer from using a certain join algorithm on a per join basis.


Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.