I have run into this several times lately so I thought I would whip out a quick post.
The scenario
A query runs much faster with a hash join hint than a nested loop but that is what the optimizer is choosing.
High level background
Merge and hash are good when you are working with lots of rows because it can leverage sequential IO. If you do a nested loop join with lookups on 100-1000+ rows, the random IOs can be 10-1000% more expensive than a sequential scan.
Why?
So why does the optimizer choose a nested loop? I have seen it is because the optimizer had to make a best guess at how make rows were being returned. It most cases, the optimizer guesses 1 row will be returned so a nested loop is the best choice as far as the optimizer can tell.
Identifying the problem
Identifying the problem is pretty easy. All you need to do is look at the query plan. It can be text, xml or graphical. Look actual and estimated rows returned. If the actual rows are 11teen,000,000 but the estimated rows is 1 then this may be the problem
Causes
I most often see this when large table variables, local variables or modified stored procedure parameters are involved. There can be other causes. See my favorite white paper.
Fix?
Easy. Don’t use table variables and local variables. If you have to modify sp parameters, pass them to a nested stored proc. Make sure stats are up to date. If the WHERE cause is complex, try putting the logic in a persisted computed column or indexed view. Another option is to use a covering index. Of course, the last resort is the JOIN hint.
Need more?
Want to get real deep on stuff like this? Check out Craig Freedman’s blog or his Inside SQL Server 2005 chapters.
posted @ Saturday, November 15, 2008 8:15 PM by DotNetKicks.com
posted @ Saturday, November 15, 2008 8:57 PM by Jeremiah Peschka
posted @ Saturday, November 15, 2008 9:29 PM by JasonMassie
posted @ Sunday, November 16, 2008 10:52 PM by Chris
Name (required)
Email (required)
Website
Notify me of followup comments via e-mail