BorderLayoutBoxedLayoutOpenLayoutMaximum textMedium textSmall text


Register
Saturday, July 04, 2009
MyStreamMinimize
Print  

Why do I have to use a MERGE\HASH JOIN Hint?

Posted by Jason on Saturday, November 15, 2008 to SQL Server 2008, SQL Server 2005, query optimizer, SQL performance tuning
1314 Views | 4 Comments | Article Rating

Hash match operator iconI have run into this several times lately so I thought I would whip out a quick post.

The scenario

A query runs much faster with a hash join hint than a nested loop but that is what the optimizer is choosing.

High level background

Merge and hash are good when you are working with lots of rows because it can leverage sequential IO. If you do a nested loop join with lookups on 100-1000+ rows, the random IOs can be 10-1000% more expensive than a sequential scan.

Why?

So why does the optimizer choose a nested loop? I have seen it is because the optimizer had to make a best guess at how make rows were being returned. It most cases, the optimizer guesses 1 row will be returned so a nested loop is the best choice as far as the optimizer can tell.

Identifying the problem 

Identifying the problem is pretty easy. All you need to do is look at the query plan. It can be text, xml or graphical. Look actual and estimated rows returned. If the actual rows are 11teen,000,000 but the estimated rows is 1 then this may be the problem

Causes

I most often see this when large table variables, local variables or modified stored procedure parameters are involved. There can be other causes. See my favorite white paper.

Fix?

Easy. Don’t use table variables and local variables. If you have to modify sp parameters, pass them to a nested stored proc. Make sure stats are up to date. If the WHERE cause is complex, try putting the logic in a persisted computed column or indexed view. Another option is to use a covering index. Of course, the last resort is the JOIN hint.

Need more?

Want to get real deep on stuff like this? Check out Craig Freedman’s blog or his Inside SQL Server 2005 chapters.

email it! |   |   |   |  | 
Permalink     4 Comments  

Rate this Post:
COMMENTS:

posted @ Saturday, November 15, 2008 8:15 PM by DotNetKicks.com


So, instead of using local variables to avoid parameter sniffing, should we be using the WITH RECOMPILE hint in our SPROCs instead? This could explain why I always end up with nested loops in my execution plans when I should be seeing HASH or MERGE JOINs

I've always used local variables to get around parameter sniffing and if this is causing slow downs, I'll happily switch.

posted @ Saturday, November 15, 2008 8:57 PM by Jeremiah Peschka


That is kinda of a different problem especially with force parameterization on. Let say the first time the query\proc is compiled and it only has to get a few rows. The QO picks a nested loop and rightly so. However, the next time it runs it returns 10k rows. It is going to use the cached plan with the loop join.
Really the only options are WITH RECOMPILE or a covering index. I would go with a covering index if it is executed a lot and recompile if it is a huge report that only runs periodically. These are general statements so test each case if you can.
A third option that I do not like as much is OPTIMIZE FOR that would be useful if you returned lots of rows 99% of the time and were ok with suboptimal performance for the 1% of small result set.

posted @ Saturday, November 15, 2008 9:29 PM by JasonMassie


Just a thought for Jeremiah, but the OPTIMIZE FOR or HASH JOIN hint might also be appropriate (or at least the lesser of the evils from which you must choose) if the hash join would perform adequately even when the number of rows in the table *is* small. For example, suppose the hash join gives decent performance with a small @table, and optimal performance when the @table is large. Suppose further that the nested loop is optimal when @table is small but absolutely horrendous when @table is large. In that case, you could go with any of the methods to force the hash join as the lesser of two evils, taking "avoid the horrendous thing" to be your optimization goal.

Of course, using #tables instead of @tables can fix this too, as Jason mentions.

posted @ Sunday, November 16, 2008 10:52 PM by Chris


Name (required)

Email (required)

Website


Simple BBCode can be used like [url=http://example.com]Example[/url] and [B]

Copyright 2006 by Statistics IO, My SQL Server Blog