Posted by
Jason Massie
on Monday, July 14, 2008 at 3:02 PM to
SQL Server 2008, SQL Server 2005, Procedure Cache, DMVs, Captain Varchar(MAX)
457 Views |
8 Comments |
I posted the first Captain VarChar(MAX) comic last Friday. It takes a swing at profiler. A lot of you are like “Duh!”. I have just done too many dba interviews where the candidate’s default answer is profiler. The interview usually gets tough or really short at that point. Just say no to SQL profiler.
I really have to stretch my imagination to come up with a situation where using SQL Profiler sounds like a good idea. For example, you quickly want to capture “ordered” events when repro’ing a problem in your *test* environment AND you filter it by your SPID. Any other time, you are wasting lots of time, cause performances overhead and probably missing what you are looking for while digging for the needle in the haystack. Also contrary to popular belief, running the GUI on another machine DOES NOT help much. For the rest of the post when I refer to a trace, I am referring to one captured with a server side trace unless I say “profiler” or “GUI”
I will admit that I have brought a production app to its knees with profiler. Picture this. You get a call while you are running profiler that the app is having issues. At first you think, “Hey, I have a trace running so at least I will be able to pinpoint the problem.” Then you see TRACE_WRITE wait types. Your stomach knots up because you realize that you just caused a production outage. Don’t be that guy(or gal).
SQL 2005+ just gives you too much info through the DMV’s. 90%+ of the time you can use the DMV’s instead of Profiler. The other 10% of the time, use a server side trace not the Profiler GUI. I have never seen a server side trace bring down an app. Even unfiltered traces on highly transactional systems where a gigabyte+ is collected every minute. I can see it causing an issue but the server would have to be pretty max’ed out to begin with.
Lastly, a trace only reports CPU, Reads, Duration etc when the query completes. Query completion for a misbehaving query could take 10x to 100x longer than normal. This makes traces unusable in a lot of cases when troubleshooting.
Let’s look at some scenario’s:
Troubleshooting a performance issue that is currently happening
A SQL trace in this situation is almost never needed on SQL 2005+. I troubleshoot like this:
1) See if\what hardware subsystem is pegged out with about 5 or 6 perfmon counters.
a) Verify with DMV data or wait stats if needed.
2) Identify the problematic queries with the sys.dm_exec_% DMV's
3) Fix.
Here is the query I use. In a nutshell, it grabs a snapshot, waits a second and does a diff so I can see what is currently slamming the server. Otherwise, the cumulative CPU and IO columns might throw me off. I also grab the query plan which is easy here but very expensive in a trace and hard to correlate with the SQL completed event.
select r.cpu_time
, r.logical_reads
, r.session_id
into #temp
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id --and s.last_request_start_time=r.start_time
where is_user_process = 1
waitfor delay '00:00:01'
select
substring(h.text, (r.statement_start_offset/2)+1 ,
((case r.statement_end_offset when -1 then datalength(h.text)
else r.statement_end_offset end - r.statement_start_offset)/2) + 1) as text
, r.cpu_time-t.cpu_time as CPUDiff
, r.logical_reads-t.logical_reads as ReadDiff
, p.query_plan
, r.wait_type
, r.wait_time
, r.last_wait_type
, r.wait_resource
, r.command
, r.database_id
, r.blocking_session_id
, r.granted_query_memory
, r.session_id
, r.reads
, r.writes
, r.row_count
, s.[host_name]
, s.program_name
, s.login_name
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id and s.last_request_start_time=r.start_time
full outer join #temp as t on t.session_id=s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) h
cross apply sys.dm_exec_query_plan(r.plan_handle) p
order by 3 desc
drop table #temp
Proactive performance tuning
Now this starts to get in to fuzzy territory. Technically, a SQL trace is more thorough than doing your analysis off of the procedure cache. However, I would say benefits outweigh the losses in most but not all cases.
Procedure Cache Pro’s
· No need to capture a trace
· No need to load a trace
· Aggregation already exists so the data can be smaller assuming that the application has good plan reuse.
· Easy access to XML Showplan
Procedure Cache Con’s
· Queries can be flushed until next execution for many reasons
· Some queries are not cached. These are usually queries that you don’t care about.
To replace SQL trace analysis with a DMV query, use something like this:
select total_worker_time/execution_count as AvgCPU
, total_elapsed_time/execution_count as AvgDuration
, (total_logical_reads+total_physical_reads)/execution_count as AvgReads
, execution_count
, substring(st.text, (qs.statement_start_offset/2)+1 ,
((case qs.statement_end_offset when -1 then datalength(st.text)
else qs.statement_end_offset end - qs.statement_start_offset)/2) + 1) as txt
, query_plan
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
--where – filter as needed.
--group by – Consider striping literals and grouping by the SQL text.
--See Inside SQL Server 2005 for more info.
--Order by – Sort as needed.
Uncommon cases
There are some less common uses of SQL traces like the blocked process report, deadlock report, recompile tracing and many others. Make sure you review all of your options and don’t forget about the WMI Provider for Server Events. Use your best judgment and avoid the GUI like the plague.
Conclusion
To summarize very briefly: Never use the SQL Profiler GUI, know your tools, and pick the best one for the job.
Posted by
Jason Massie
on Thursday, June 26, 2008 at 5:15 PM to
SQL Server 2005, query optimizer, SQL performance tuning, tsql, DMVs
524 Views |
3 Comments |
So you are experiencing CXPACKET wait types? If you run a google search, you will quickly find out you are experiencing the dreaded SQL Server parallelism problem and you must reduce MAXDOP to 1. While that is possible, I say most likely not. I say that, most of the time, SQL is doing the best it can with what it has to work with. That is the query, the data and the schema.
Microsoft has come a long way since version <= 7.0 when it comes to parallelism. If your OLTP queries follow best practices and are well indexed, they probably will never generate a parallel plan. This is because they are fast and access a small amount of rows. If they are missing indexes or SQL overestimates cardinality, SQL might decide to do scans, sorts, hashes, spools etc. These iterators, among others, can go parallel to reduce execution time at the cost of system resources. These iterators are not bad and they do have their place. It just isn’t on OLTP type of queries most of the time.
So dropping the MAXDOP on an OLTP system to 1 probably won’t hurt much because most of the time there is an IO bottleneck. If the CXPACKET wait types are a symptoms of poor indexing and row count estimation, it won’t help either. Whole books have been written on indexing, query tuning and there is a nice whitepaper on stats best practices to avoid estimation problems so I am not going to go into that. I will give you some pointers on quickly identifying whether the CXPACKET is a symptom or the cause.
So your boss and boss's boss run over to your desk yelling about the customer complaints of slowness with SQL. You run this query or something similar:
select r.cpu_time
, r.logical_reads
, r.session_id
into #temp
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id --and s.last_request_start_time=r.start_time
where is_user_process = 1
waitfor delay '00:00:01'
select substring(h.text, (r.statement_start_offset/2)+1 , ((case r.statement_end_offset when -1
then datalength(h.text) else r.statement_end_offset end - r.statement_start_offset)/2) + 1) as text
, r.cpu_time-t.cpu_time as CPUDiff
, r.logical_reads-t.logical_reads as ReadDiff
, p.query_plan
, r.wait_type
, r.wait_time
, r.last_wait_type
, r.wait_resource
, r.command
, r.database_id
, r.blocking_session_id
, r.granted_query_memory
, r.session_id
, r.reads
, r.writes
, r.row_count
, s.[host_name]
, s.program_name
, s.login_name
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id and s.last_request_start_time=r.start_time
full outer join #temp as t on t.session_id=s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) h
cross apply sys.dm_exec_query_plan(r.plan_handle) p
order by 3 desc
drop table #temp
You notice that you have several rows with CXPACKET wait types. With this query, you click the xml show plan link and:
1. search the xml for missing indexes.
2. Save as a .sqlplan and reopen in SSMS
3. Compare estimated and actual rows in the iterators on the right side of the plan. Poor estimates may bubble to the left as well.
4. If there are no missing indexes, estimates and actual are fairly close, reducing MAXDOP may help if it is not a huge report or query.
5. If there are missing index or bad estimations, fix it! :)
Of course, these are not rules set in stone. Just a style in the art of database administration. Just don't blame it on a "parallelism bug" because it is a poor musician that blames his instrument.
One thing to note, if you do turn down MAXDOP server-wide, turn in on at the query level(enterprise edition) on your index operation because they are optimized for it.
Alter index all on tblBlah rebuild with (maxdop=32);
<