In part 1, I talk about what I consider feeble attempts at implementing a reporting server through log shipping, mirroring\snapshots and, to a lesser extent, replication. Unless you invest in a real ETL solution, I argued that it is better to run a mixed workload. I talked about the architectural advantages of running mixed workloads in part 2. In a nutshell, doubling the hardware and cutting the data in half.
In this final post, we will talk about new features in SQL Server 2008 and some features that have been around a while that can help with mixed workloads.There are also some bad practices that could be the right answer that we won’t talk about but let us mention triggers, table valued functions, 20 table outer joins, some correlated subqueries and table variables. These are options but usually not good ones. In the right circumstances, they could be right like an end of year report.
Here are the main tools in your arsenal:
Resource Governor
Chances are you do not want to limit throughput of your OLTP queries. The resource governor does not do a good job with these queries anyway because their duration is usually so short. However, let’s say you have reports that run by executives. You can put them in a workload group that gives them as much resources as possible without affecting OLTP traffic. You may also have a less important group of reports from the marketing or sales teams that you can limit further. One caveat to the resource governor is it cannot limit, disk IO so if that is your bottleneck, this will not help much.
Covering Filtered indexes
Filtered indexes are a great new feature in SQL Server 2008. When optimizing for reporting queries on your OLTP system, you are probably going to be touching a lot of rows so covering the query is important. For example, the order fulfillment team works off a report of unfulfilled orders that pulls in order data, customer data, shipping data etc. In this case, you would add covering filtering indexes on each of those tables. The filtered indexes reduce write overhead on your OLTP writes and reduces read overhead of your reporting.
Indexed Views
Indexed views take filtered indexed view a step further. You can create indexes on multiple tables. Think of it as denormalization alongside your OLTP optimized schema. In the previous order fulfillment example, we can basically persist that report and have it updated in real time. There is more overhead to your OLTP transactions so weigh the pro’s and con’s. Test if possible.Unfortunately, you cannot defer changes to your indexed views but I believe there is a feature request for this on Connect and I will tell you about a workaround shortly.
Partitioning and Compression
This is the dynamic duo when mixing workloads. Unfortunately, the nitty gritty details would require their own post. For example, one mixed workload may benefit from compression on the hottest partition while the older data should be uncompressed. However, another workload may benefit from the opposite. The key here is really understanding your workload, data and hardware limitations. Most importantly, plan then TEST, TEST, TEST! Once you partition, you loose online operations so if you do it wrong, you are stuck.
Persisted Computed Columns
This is an easy one. It is a simple trade off. Writes take a little more CPU and space in exchange for reduced CPU time when you report. Take your orders table, for example. You could calculate and save shipping costs when you insert the rows. If it adds a few milliseconds to the insert but shaves seconds off the hourly open orders report that the execs are looking at, it may be an easy decision.
Archival
This might not always be possible depending on your data. It may not be necessary if you have finely tuned indexes. However, it could make a night and day difference. If you need the data, UNION ALL’ing the production table with the archive table has little overhead. I do suggest you keep the archive database on the same server unless it will rarely be accessed. Trying to do this with linked servers is bad.
After hours denormalization
This is basically precreating reports during off hours. Think of it as indexed view with deferred updates. You can UNION with the OLTP tables if you need realtime data in your report. In an ideal world, touching less rows in the OLTP table and then UNIONing with the denormalized data will result in the best of both worlds if you need real time data.
The final word
As the concurrency and size of data scales, both a pseudo reporting database and a mixed work load scenario will not meet business requirements. A business requirement of real time data may dictate a mixed workload. There may be plenty of workloads where scaling out and scaling up both meet performance demands. I just wanted to play devil’s advocate and let you know there is another option when planning reporting.
SHOW VARIABLES – This is like sp_configure or sys.configurations
SHOW STATUS – This like sys.dm_os_%. Basically, all runtime counters for the database engine components.
SHOW PROCESSLIST – This is like sp_who2
SHOW TABLE STATUS – This is like sp_help
These can be filtered like this: SHOW VARIABLES LIKE ‘%innodb%’
Those are the big ones. The full list can be found here.
Microsoft has 10 “Express Database Schemas” that I ran across and I thought I would pass along. I am always on the look out for sample databases. Northwind will always have a special place in my heart like the girlfriend that cheated on me with my best friend. AdventureWorks is cool but the fact that every single feature in SQL Server 2008 is crammed into it makes it klunky. Since these are designed for different types of applications, it might be a good idea to drop these in your toolbox.
These are on the SQL Server Express page. Of course, there is no reason that these cannot be run on the full version of SQL.
Click here for any or all of them:
It appears they were designed by Barry Williams so credit is due where it is deserved.
About the Author
Barry Williams, founder and Principal Consultant with www.DatabaseAnswers.org, has been working with SQL Server since the mid-90's. Barry works as an independent consultant, trainer and writer and is a popular speaker on Enterprise Data Management at major Conferences. He can be reached at info@barryw.org.
I have been working on code to auto trace anyone who logs in under the SA role that is not a job or system process. It would be filtered by spid and auto stop when they logout. The only thing I could come up with is two WMI alerts, two stored procedures and two jobs. It is a craptastic monstrosity at best.
It would be easier if their was a logout DDL trigger but I think a trace definition as a property of a login or role would be a better solution. Think of it as the “big brother” checkbox for the JR DBA.
I have filed a connect here. Vote if you would find this useful.
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=404808
I have cleaned up and updated my unused index query. It returns a list of indexes ordered by ascending reads. If the instance has been up for a long time, it is probably safe to drop indexes with zero or close to zero reads. If the reads are low and the writes are high, this may help improve your transactions per second count. Otherwise, you are just cleaning up unused space. Use caution though. A missing index is worse than an unused index.
SELECT objectname=OBJECT_NAME(s.OBJECT_ID) , indexname=i.name, i.index_id , reads=user_seeks + user_scans + user_lookups , writes = user_updates , p.rowsFROM sys.dm_db_index_usage_stats s JOIN sys.indexes i ON i.index_id = s.index_id AND s.OBJECT_ID = i.OBJECT_ID JOIN sys.partitions p ON p.index_id = s.index_id AND s.OBJECT_ID = p.OBJECT_IDWHERE OBJECTPROPERTY(s.OBJECT_ID,'IsUserTable') = 1 AND s.database_id = DB_ID() AND i.type_desc = 'nonclustered'AND i.is_primary_key = 0AND i.is_unique_constraint = 0AND p.rows > 10000ORDER BY reads, rows DESC
I will also be putting this query up at the SSP WIKI.
It is fairly widely known that implicit conversions of data types can cause scans. However, you may not be able to control what goes in your database, you may have inherited an app and we just are not perfect.
On top of that some can be tricky. For example, did you know that scope_identity and @@identity both return numeric data types. An AVG of a SMALLINT column returns an INT.
Here is a simple repro which the same query does an index scan instead of an index seek
CREATE TABLE t1( id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c1 VARCHAR(100) ) GOINSERT INTO t1(c1) SELECT 1GO 10000INSERT INTO t1(c1) SELECT 0GO 10CREATE INDEX ix ON t1 (c1) CREATE PROC p1@blah NVARCHAR(100) ASSELECT id FROM t1WHERE c1 = @blahEXEC p1 N'0'CREATE PROC p2 @blah VARCHAR(100) ASSELECT id FROM t1WHERE c1 = @blahEXEC p2 '0'DROP TABLE t1DROP PROC p1DROP PROC p2
I found a sample query from Umachandar Jayachandran and made some mod’s to it. It basically scans your procedure cache for index scans containing implicit conversions.
I have posted it at the SQLServerPedia Wiki here. It is the 2nd query listed.
Note: Implicit conversions may not always result in a scan.
BTW, did you know I am an editor for the TSQL code section over at http://sqlserverpedia.com/wiki/Transact_SQL_Code_Library. I would have announced it sooner but I hadn’t actual done anything yet with the job drama and holidays. Welp, the time has come. I have uploaded several DMV queries and I am about to go over the other submissions. On top of that, I will be participating in a web cast on it. We will get to that in a minute.
I was wondering how this whole wiki thing would work out. I mean every time I hear wiki, I think of Newcleus. Why not just post stuff here. Well, I will probably do that too when I run across something cool.
Here is what makes the wiki cool:
Here are the details about the webcast I mentioned:
ctrl+v via BrentO
On Thursday, December 11th at 8am Pacific, 11am Eastern, I’ll be doing a live video webcast with some of my fellow SQLServerPedia editors:Denny Cherry, Performance Tuning EditorGreg Low, Architecture & Configuration EditorJason Massie, T-SQL Code Library EditorWe’re going to talk about what we’re doing, why we’re doing it, and how you can help. I’ll even create a new Wiki article and edit an existing Wiki article in real time, in front of everybody, to show how it works and how easy it is. (Watch it not work, hahaha.)Sign up for the SQLServerPedia Webcast
On Thursday, December 11th at 8am Pacific, 11am Eastern, I’ll be doing a live video webcast with some of my fellow SQLServerPedia editors:
We’re going to talk about what we’re doing, why we’re doing it, and how you can help. I’ll even create a new Wiki article and edit an existing Wiki article in real time, in front of everybody, to show how it works and how easy it is. (Watch it not work, hahaha.)
Sign up for the SQLServerPedia Webcast
To quote Tim Ford, “I will be the dumbest guy in the room” :) It should be fun though!
Disclaimer:
Just because you can does not mean you should and even if you should, use with care. For example, you will have no friends after a day or so of updating your facebook status every time a tlog backup runs on your farm. There are also many security concerns to consider. That said, I can think of a few uses for SQL Server Web 2.0 mashups and I am not all that creative.
How it works
Sign up at http://ping.fm and then sign up at whatever services you want to update from SQL. Here is a list of what they currently support. From SQL, we could send a message to twitter or even better, yammer via database mail. Here is the obligatory, “Don’t send anything confidential over email”
Possible Uses
Augmenting your monitoring – The keyword is word is “augmenting” except for the smallest installations. For the crown jewel of the company, you might get a call from the 24/7 staff and an email. Why not a tweet too especially if it can add SMS functionality that you may not currently have?
Turn X into an RSS feed – Where X could be, all job executions, any SEV error messages, any trace event or DDL operations, or [your idea here]. You can funnel the RSS into the intranet, your reader or [insert creative idea here].
An insert trigger on your press release table – Blast it to the world. Some companies are already doing similar things http://dell.com/twitter.
Personal updates from the SQL Editor – I don’t know about you but I have spent weeks on end in tsql. Some deadlines don’t allow for time to come up for air. It would pretty cool in a geeky way to tell the world that I just replaced 500 lines of cursor code with a 10 liner or whatever without leaving SSMS.
Example Code:
sp_configure 'show advanced options', 1;GORECONFIGURE;GOsp_configure 'Database Mail XPs', 1;GORECONFIGUREGOEXECUTE msdb.dbo.sysmail_add_account_sp@account_name = 'Ping.fm demo', @description = 'Mail account', @email_address = 'jason@statisticsio.com', @replyto_address = 'jason@statisticsio.com', @display_name = 'Jason Massie', @mailserver_name = 'mail.statisticsio.com' ;-- Create a Database Mail profileEXECUTE msdb.dbo.sysmail_add_profile_sp@profile_name = 'MailAccount', @description = 'Profile used for administrative mail.' ;-- Add the account to the profileEXECUTE msdb.dbo.sysmail_add_profileaccount_sp@profile_name = 'MailAccount', @account_name = 'Ping.fm demo', @sequence_number =1 ;-- Grant access to the profile to all users in the msdb databaseEXECUTE msdb.dbo.sysmail_add_principalprofile_sp@profile_name = 'MailAccount', @principal_name = 'public', @is_default = 1 ;
-- here is the magic blasphemy. EXEC sp_send_dbmail @profile_name ='MailAccount', @recipients = 'YourUniqueEmail@ping.fm’, @subject = 'Blogged:', @body = 'This update came from SQL Server http://ping.fm/XiEkj'I just updated twitter, linkedin, facebook, plurk, rejaw from SQL!
EXEC sp_send_dbmail @profile_name ='MailAccount', @recipients = 'YourUniqueEmail@ping.fm’, @subject = 'Blogged:', @body = 'This update came from SQL Server http://ping.fm/XiEkj'
In closing
One more time, pay attention to security. There are a few commercial products on the horizon. http://ping.fm and most of these service also have API’s that may be a better alternative than database mail. Finally, database mail was meant for administrative alerts and I suggest you stick with that mindset..
I am in a what feels like a whirl wind tour of the globe so content may be light this week. Jetlagged in Denmark right now and should seriously be asleep. I have some interesting topics in the works but until then I have "suggested reading" at several places.
FriendFeed:
http://friendfeed.com/statisticsio
Google Reader Shared Items:
http://www.google.com/reader/shared/09956560379006770135
Facebook:
http://www.facebook.com/profile.php?id=1013730310&ref=profile
Google reader is probably the most SQL focused link. The others have google reader shares + stuff. If you are reading this, chances are you will find what I find cool at least slightly interesting as well.
This is a feature that I didn't even know existed until the recent SQL Server 2008 beta tests even though it was introduced in SQL Server 2005. So.... I wanted to pass it on.
Rob Farley:
OUTPUT clause - knowing what goes in, and what you've accidentally taken out.
Scott on Writing
OUTPUTing Data from the Just-Inserted, Updated, or Deleted Row(s)
Check it out and I believe there is room for improvement if there is demand so make requests at connect.
New in SQL Server 2008 is a server setting called “Optimize for adhoc workloads”. I was happy to see this. You should be too especially if you have ever had arm wrestle an app that causes a bloated proc cache on an x86 box. Ugghh… Adam blogged on it here, Bob blogged on it here and here is the documentation.
I was wondering how this setting would play with the forced parameterization database setting. It looks like forced parameterization trumps the new server wide setting. If my simple tests below are right, it could present an interesting problem. What if your app generates a lot of totally unique adhoc queries that have no benefit from parameterization and you have queries that would benefit from parameterization? Well, the answer is still the same. Write stored procedures. :)
In the real world, sometimes you get a pig and all you can do is put lipstick on it. If you have to choose, “Optimize for Adhoc” would most benefit memory\IO bound servers while “forced parameterization” could help both the CPU and memory usage. However, that is a very general statement. Other things should be considered so it will depend.
--So what happens when you turn on "Optimize for adhoc" and forced parameterization exec sp_configure 'show advanced options', 1reconfigure with overridegoexec sp_configure 'optimize', 1reconfigure with override alter database master set parameterization forceddbcc freeproccache --Let's run a query will not generate a trivial plan.select name, object_id, create_datefrom sys.all_objectswhere object_id = 3 and create_date = '2008-07-09 16:19:59.943'go --The CacheObjType is a compiled plan and 57344 bytesselect p.cacheobjtype, p.size_in_bytes , s.*from sys.dm_exec_query_stats sjoin sys.dm_exec_cached_plans pon s.plan_handle = p.plan_handlecross apply sys.dm_exec_sql_text(sql_handle) where text like '%select name , object_id , create_date from%' --Now, let’s turn off forced parameterizationalter database master set parameterization simpledbcc freeproccache select name, object_id, create_datefrom sys.all_objectswhere object_id = 3 and create_date = '2008-07-09 16:19:59.943'go --The CacheObjType is a Compiled Plan Stub and 320 bytesselect p.cacheobjtype, p.size_in_bytes , s.*from sys.dm_exec_query_stats sjoin sys.dm_exec_cached_plans pon s.plan_handle = p.plan_handlecross apply sys.dm_exec_sql_text(sql_handle) where text like '%select name , object_id , create_date from%'
--So what happens when you turn on "Optimize for adhoc" and forced parameterization exec sp_configure 'show advanced options', 1reconfigure with overridegoexec sp_configure 'optimize', 1reconfigure with override alter database master set parameterization forceddbcc freeproccache
--Let's run a query will not generate a trivial plan.select name, object_id, create_datefrom sys.all_objectswhere object_id = 3 and create_date = '2008-07-09 16:19:59.943'go
--The CacheObjType is a compiled plan and 57344 bytesselect p.cacheobjtype, p.size_in_bytes , s.*from sys.dm_exec_query_stats sjoin sys.dm_exec_cached_plans pon s.plan_handle = p.plan_handlecross apply sys.dm_exec_sql_text(sql_handle) where text like '%select name , object_id , create_date from%'
--Now, let’s turn off forced parameterization
alter database master set parameterization simpledbcc freeproccache
select name, object_id, create_datefrom sys.all_objectswhere object_id = 3 and create_date = '2008-07-09 16:19:59.943'go
--The CacheObjType is a Compiled Plan Stub and 320 bytesselect p.cacheobjtype, p.size_in_bytes , s.*from sys.dm_exec_query_stats sjoin sys.dm_exec_cached_plans pon s.plan_handle = p.plan_handlecross apply sys.dm_exec_sql_text(sql_handle) where text like '%select name , object_id , create_date from%'
Here is a quick query I wrote today. It is the first time I had to go to this DMV so I thought I would share. It would be useful when planning for consolidation and troubleshooting a bunch of apps that have been consolidated or are hosted in a shared environment. Once you find the database, you can break it down by object and index with Tom Davidson's query.
select db_name(database_id) as dbName, count(*)*8/1024 as BufferPoolMB
from sys.dm_os_buffer_descriptors
group by db_name(database_id)
order by 2 desc
In the first part, we looked at the a very simple example with a single table SELECT using covering indexes. This one is a little more complex. We are still covering but we are joining two tables, ordering and grouping. Filtered indexes do not support joins so we have to create two indexes. In this test, the indexed view wins but filtered indexes come a respectable 2nd place compared to normal indexes.
The DDL(2.5M in tblitem and 25M in tblitemdetails)
CREATE TABLE [dbo].[tblItem](
[Itemid] [int] IDENTITY(1,1) NOT NULL,
[ItemName] [varchar](100) NULL,
[Datecreated] [datetime] NULL,
[Closed] [bit] NULL,
PRIMARY KEY CLUSTERED
(
[Itemid] ASC
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblItemDetails](
[ItemDetailsID] [int] IDENTITY(1,1) NOT NULL,
[ItemID] [int] NULL,
[Color] [varchar](20) NULL,
[Size] [varchar](20) NULL,
[Flavor] [varchar](20) NULL,
[ItemDetailsID] ASC
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[tblItemDetails] WITH CHECK ADDCONSTRAINT [fk01] FOREIGN KEY([ItemID])
REFERENCES [dbo].[tblItem] ([Itemid])
ALTER TABLE [dbo].[tblItemDetails] CHECK CONSTRAINT [fk01]
--Create normal and filtered indexes
create index ix01 on tblitem(Datecreated, ItemName)
create index ix02 on tblitem(Datecreated, ItemName) where Datecreated > '2007-08-22' and itemid > 0 and itemid < 50
create index ix01 on tblitemdetails(itemid, color)
create index ix02 on tblitemdetails(itemid, color) where itemid > 0 and itemid < 50
--Create indexed views
alter view ivw01
with schemabinding
as
select a.ItemID, b.Color, COUNT_big(*) as cnt
from dbo.tblItem a join dbo.tblItemDetails b on a.Itemid= b.ItemID
where Datecreated > convert(datetime, '2007-08-22', 101) and a.itemid > 0 and a.itemid < 50
group by a.ItemID, b.Color
create unique clustered index ix01 on ivw01(itemid)
create index ix02 on ivw01(itemid, color)
Test Queries
set statistics io on
--Using a normal covering index
from dbo.tblItem a with (index=ix01) join dbo.tblItemDetails b with (index=ix01) on a.Itemid= b.ItemID
order by cnt
--using a filtered index
from dbo.tblItem a with (index=ix02) join dbo.tblItemDetails b with (index=ix02) on a.Itemid= b.ItemID
--using an indexed view
select * from ivw01
IO numbers:
Normal covering indexes
(8 row(s) affected)Table 'tblItemDetails'. Scan count 8, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'tblItem'. Scan count 1, logical reads 991, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Filtered covering indexes
(8 row(s) affected)Table 'tblItem'. Scan count 1, logical reads 99, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'tblItemDetails'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Indexed View
(8 row(s) affected)Table 'ivw01'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Note: By increasing the result set 100 fold(itemid > 0 and itemid < 5000), we see very similar results except with merge joins.
Query Plans:
The query plans are different. Most notably being the scans for the filtered index and indexed view vs. the normal index. This is fine though. They as just a fraction of the size of the normal index.
Conclusion
The mere fact that the indexed view runs with 50 times less IO overhead does not make it the automatic choice. What if you are on standard edition? Filtered indexes work on standard edition. Also we filtered on a date column. What if you need a real time rolling total? You have to use a normal index. Know you tools and then test, test and test some more.
This might be HAWTALTA for some of you. I just thought of doing this today even though I have been using xml showplan since I was knee high to a junebug. You can make query results clickable just by casting them as XML. For example:
select cast('select 1' as xml)
By clicking the resulting "select 1", a new query editor window opens with "select 1" ready to execute. This would be useful for scripts that generate scripts. Here is a practical example. (Forgive the formatting. I cannot get too wide in the post.)
SELECT sys.objects.name
, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) as Impact
, cast('CREATE NONCLUSTERED INDEX ~NewNameHere~ ON '
+ sys.objects.name + ' ( ' + mid.equality_columns +
CASE WHEN mid.inequality_columns IS NULL THEN ''
ELSE CASE WHEN mid.equality_columns IS NULL
THEN '' ELSE ',' END + mid.inequality_columns END + ' ) '
+ CASE WHEN mid.included_columns IS NULL
THEN '' ELSE 'INCLUDE (' + mid.included_columns + ')'
END + ';' as xml) AS CreateIndexStatement
, mid.equality_columns
, mid.inequality_columns
, mid.included_columns
FROM sys.dm_db_missing_index_group_stats AS migs
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
INNER JOIN sys.objects WITH (nolock)
ON mid.object_id = sys.objects.object_id
WHERE (migs.group_handle IN
(SELECT TOP (500) group_handle
FROM sys.dm_db_missing_index_group_stats WITH (nolock)
ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC))
ORDER BY 2 DESC
Here are what some of the results look like.
Update : You also have to associate the XML file extension with the SQL editor to be executable.
So you are experiencing CXPACKET wait types? If you run a google search, you will quickly find out you are experiencing the dreaded SQL Server parallelism problem and you must reduce MAXDOP to 1. While that is possible, I say most likely not. I say that, most of the time, SQL is doing the best it can with what it has to work with. That is the query, the data and the schema.
Microsoft has come a long way since version <= 7.0 when it comes to parallelism. If your OLTP queries follow best practices and are well indexed, they probably will never generate a parallel plan. This is because they are fast and access a small amount of rows. If they are missing indexes or SQL overestimates cardinality, SQL might decide to do scans, sorts, hashes, spools etc. These iterators, among others, can go parallel to reduce execution time at the cost of system resources. These iterators are not bad and they do have their place. It just isn’t on OLTP type of queries most of the time.
So dropping the MAXDOP on an OLTP system to 1 probably won’t hurt much because most of the time there is an IO bottleneck. If the CXPACKET wait types are a symptoms of poor indexing and row count estimation, it won’t help either. Whole books have been written on indexing, query tuning and there is a nice whitepaper on stats best practices to avoid estimation problems so I am not going to go into that. I will give you some pointers on quickly identifying whether the CXPACKET is a symptom or the cause.
So your boss and boss's boss run over to your desk yelling about the customer complaints of slowness with SQL. You run this query or something similar:
select r.cpu_time
, r.logical_reads
, r.session_id
into #temp
from sys.dm_exec_sessions as s
inner join sys.dm_exec_requests as r
on s.session_id =r.session_id --and s.last_request_start_time=r.start_time
where is_user_process = 1
waitfor delay '00:00:01'
select substring(h.text, (r.statement_start_offset/2)+1 , ((case r.statement_end_offset when -1
then datalength(h.text) else r.statement_end_offset end - r.statement_start_offset)/2) + 1) as text
, r.cpu_time-t.cpu_time as CPUDiff
, r.logical_reads-t.logical_reads as ReadDiff
, p.query_plan
, r.wait_type
, r.wait_time
, r.last_wait_type
, r.wait_resource
, r.command
, r.database_id
, r.blocking_session_id
, r.granted_query_memory
, r.reads
, r.writes
, r.row_count
, s.[host_name]
, s.program_name
, s.login_name
on s.session_id =r.session_id and s.last_request_start_time=r.start_time
full outer join #temp as t on t.session_id=s.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) h
cross apply sys.dm_exec_query_plan(r.plan_handle) p
order by 3 desc
drop table #temp
You notice that you have several rows with CXPACKET wait types. With this query, you click the xml show plan link and: 1. search the xml for missing indexes. 2. Save as a .sqlplan and reopen in SSMS3. Compare estimated and actual rows in the iterators on the right side of the plan. Poor estimates may bubble to the left as well.4. If there are no missing indexes, estimates and actual are fairly close, reducing MAXDOP may help if it is not a huge report or query.
5. If there are missing index or bad estimations, fix it! :)
Of course, these are not rules set in stone. Just a style in the art of database administration. Just don't blame it on a "parallelism bug" because it is a poor musician that blames his instrument.
One thing to note, if you do turn down MAXDOP server-wide, turn in on at the query level(enterprise edition) on your index operation because they are optimized for it.Alter index all on tblBlah rebuild with (maxdop=32);
I am going to give a talk on the data collector in SQL Server 2008 at this month's meeting.
Topic:Data Collections in SQL Server 2008
Date:Thursday, June 19th, 2008
Overview:
Data collections are a new feature in SQL Server 2008. They allow you to collect perfmon, trace or query data to a central database. We will start off by setting up and configuring Data Collections. We will look at the default collections and historic reports in Management Studio. We will cover setting custom collections and providing a UI with SSRS.
Directions etc can be found here.
I am pumped up about TechEd. Bill Gates's last technical keynote. He has to have a bomb to drop... The launch of Windows\VS\SQL Server 2008. All of the speakers are reaching deep into their bag of tricks and bringing new hardcore sessions. The universal party. The blogger's lounge. Twitter, FriendFeed, MSDN, and Technet are all buzzing with posts.
I found out a couple of days ago that a speaker would not be able to make it due to a family emergency. I offered to present his session on DMV's so I will be presenting these sessions:
Using Dynamic Management Views to Improve Your Development
Dynamic Management Views were added to SQL Server 2005 and have been enhanced in SQL Server 2008. While they provide great functionality and usefulness, it appears they are not widely understood or implemented as yet. This session provides an overview of the Dynamic Management Objects available (both Views and Functions) and describes those considered the most useful. It shows how they can be utilized in monitoring, troubleshooting, and ongoing management of SQL Server systems and in gaining an understanding of SQL Server’s operation. The session also illustrates how they can be used to improve the database development experience and how they can be used in custom reports within SQL Server Management Studio.
Database Platform300BRKJason Massie
Windows Management Instrumentation (WMI) for the Command Line DBA
This is a demo filled session on how a command line DBA can unleash the power of WMI. WMI allows the DBA access to information that was previously hard to get or unavailable like the application event logs, OS perfmon counters, and OS events. This session covers WQL queries from SSIS, accessing OS performance counters from SQL, automated reactions to WMI events, WMI from Windows PowerShell and much more.
Database Platform400BRKJason Massie
Troubleshooting Query Plans Gone Wrong
Have you ever flushed the procedure cache to resolve a performance problem? Have you ever had to use a join or index hint when you should not have to? Have you ever updated statistics to fix a performance problem and wondered why it worked? In this session, we get deep into how the query optimizer decides how to execute a SQL statement. We look at common problems that cause the optimizer to choose the wrong access paths like underestimation. We go over a detailed demo on parameter sniffing, problems with local variables and statistics. We also look at solutions to these problems in several areas including good design at the application, schema, and query levels, proactive maintenance and reactive fixes. This session applies to Microsoft SQL Server 2000 through 2008 but we pay special attention to new features in SQL Server 2008 like plan freezing, new hints, filter indexes, and statistics among others.
I'll be there Monday through Saturday. Hope to see you there!
I tried to think of a clever way to way tie this in with SQL but I got nothing. I have a SQL 2005 upgrade Sunday night so I left early and got to the music room while the house was empty. This was the result. Its not perfect but the music is varied and quality from labels like Delsin, Downlow, and Clone. Artists like Mariel Ito, Convextion, Metro Area, Glenn Underground, Dan Curtin, DBX, Art of Noise, and Kraftwerk. Berlin to Detroit. 1984 to present. It has a high nerd content. 74min 320kbs Enjoy.
We regret to inform you of a passing in the SQL community. Read more here.
Rick Heiges has a post on my favorite new feature in SQL Server 2008, filtered indexes.
So does Decipherinfosys along with a good description on the difference of indexes and statistics if you need some background.
SQLBlogcasts has gotten an upgrade and Tony posts some great stats. Congrats!
Not SQL per say but the Hyper-V release candidate has been released. Speaking of Hyper-V, Sriram posts his slide decks on virtualizing SQL. Part 1 and Part 2.
This is a great starting point for SQL Server 2008 as is this. These come by way of the MSDN\Technet update blog.
The SQL Server 2005 sp3 debate continues.
The SQL ISV team posts a performance improving cursor rewrite sample. However, it is not ANSI compliant :) which is odd since most ISV tsql code needs to be portable.
Paul Nielson will be releasing a DVD.
While we are at it, check out the new SQL Server social network.
alt head: Got ADD?
Technorati Tags: SQL Server,SQL Server 2008,SQL Server 2005,SQL Server 2000,tsql,cursors,virtualization,filtered indexes,social networking
There are some new SQL Server 2008 virtual labs.
Choose a title below to begin your evaluation process.
Also just by evaluating, you are entered to win an HP home server box.
My jaw literally dropped when I saw it. If this works as advertised, it has the potential of changing everything. The days of over indexing will be over. Dynamic indexing off of the missing and unused index DMV's could be possible especially with added support. I also think this will better accomplish what people tried to do with partitioning for performance reasons in SQL Server 2005.
It will be really interesting to see how it gets applied in production and where this leads in the next versions. Maybe an autoindex checkbox will replace the DBA :)
This is a small and quick example. We will have to see how it scales to larger environments.
use adventureworks
--Let's nullify a random 400
update Production.WorkOrder
set EndDate = null
where WorkOrderID in (select top 400 WorkOrderID from Production.WorkOrder where enddate is not null)
--Let's give a random 200 a recent date to mimic prod data
update top (200) Production.WorkOrder
set duedate = getdate()-10
where enddate is null
--Base query --This is query type that should be simple yet common --Let's get open WO's that will be due soon
select p.Name, wo.OrderQty, wo.DueDate
from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID
where wo.EndDate is null and wo.DueDate >= '2007-11-24'
--CI Scan with loop join
--Table 'WorkOrder'. Scan count 1, logical reads 530, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.--Now let's create a covering index
create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)
--Run the base query
set statistics io on select p.Name, wo.OrderQty, wo.DueDate from Production.WorkOrder wo JOIN Production.Product p on wo.ProductID=p.ProductID where wo.EndDate is null and wo.DueDate >= '2007-11-24'
--Does NCI seek
--Table 'WorkOrder'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Clean up
drop index [Production].[WorkOrder].[ix01]
--Create Filter index
create index ix01 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty) where EndDate is null and DueDate >= '2007-11-24'
--Run base query
set statistics io on select p.Name, wo.OrderQty, wo.DueDate
--It uses it. We have a real small sample but it performs better
--Table 'WorkOrder'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --Now let's look at size --Create unfiltered index for comparison
create index ix02 on Production.WorkOrder(EndDate,DueDate, ProductID) include (OrderQty)
--note the new syntax
declare @dbid int = db_id()
declare @objid int = object_id('[Production].[WorkOrder]')
select * from sys.dm_db_index_physical_stats(@dbid, @objid, null, null, 'detailed') ps join sys.indexes i
on ps.object_id=i.object_id and ps.index_id=i.index_id
and i.name in ('ix01', 'ix02') and i.type_desc='NONCLUSTERED'
--We are looking at 2 page vs. 301 pages
That is a huge difference in size. The major point is not the fact that we retrieve the same while taking up so much less space on disk but so much less space in memory as well. We are going to get into this much more!
Technorati Tags: SQL Server 2008,Filtered indexes,CTP6
Have you every had a query that should totally be using an existing index and doing a seek but it keeps doing a clustered index scan? In SQL Server 2008, you can force it to do a seek. That may or may not be a good thing. Let's take a look at an AdventureWorks example.
--logical reads 1246
select ProductID, OrderQty from sales.SalesOrderDetail
where ProductID > 776
This query generates this plan.
Now let's add the hint.
--logical reads 271264
select ProductID, OrderQty from sales.SalesOrderDetail with (forceseek)
This query generates this plan. The second query plan contains the index seek but logical reads are 20 times more. This is because it has to do lookup for OrderQty. Obviously using the hint would be a bad thing. So when would be a good time to use this hint? Let's look at this query create proc p1 @p int as select c1, c2 from t1 where c3 = @p t1 has 1 million rows 99% of c3 =1 .5% of c3 = 2 .5% of c3 = 3 99% of the queries run with @p = 2 or 3 The problem is if the query compiles with c3 = 1. The optimizer will choose a CI scan. In SQL 2005, we could recompile each time at the statement level or we could use an optimize for hint. Now we have another tool in the bag of tricks.Use with caution.
In SQL 2000 I wanted to find unused indexes but found it to be difficult. I had to capture a huge trace that fully represents the workload and “trust” that the ITW knows what it’s talking about. You could also use the scan started trace event filtered by dbid, objectid, indexid. This procedure was very tedious.
In SS2005, a quick query of the dynamic management views lets you know which indexes are not being used. You can do this in a few minutes what had previously taken days or weeks. This alone is a very powerful feature.
I am proposing that you take it a step further. We are going to use a little logic and the missing index DMV’s to combine indexes and remove indexes that are still used but redundant. The steps in this process would look like this:
1. Remove unused indexes with the unused index script
2. Get your list of tables to analyze.
3. Remove redundant but used indexes.
4. Revue missing index DMV’s for mistaken index drops.
5. Combine indexes that where it is logical to do so.
6. Revue missing index DMV’s for mistaken index drops.
This methodology is most effective and viable when these conditions are met:
· The server has been online and thus collecting stats for a long time.
· The server is not pushing a hardware bottleneck. If so, this should be done during maint window.
· The server is enterprise edition and the tables allow online operations (i.e. no LOB data or partitions).
What are the benefits of removing unused indexes?
· Reduced writes during updates
· Reduced space usage
· Reduced backup\restore space and time
· Reduced index maintenance time
What are the benefits of removing unused indexes, used but redundant indexes and combining indexes?
· Less memory footprint for the same amount of data
· A different angle to index tuning
· Indexes are more likely to be covering
Now, let’s get down to business…
Remove Unused Indexes
This is pretty straightforward and relatively safe. You want to make sure that your server has been up long enough to get good index usage. This includes infrequent operations like month end reporting, etc.
--Unused indexes
declare @dbid int
select @dbid = db_id()
select object_name(s.object_id) as ObjName
, i.name as IndName
, i.index_id
, user_seeks + user_scans + user_lookups as reads
, user_updates as writes
, sum(p.rows) as rows
from sys.dm_db_index_usage_stats s join sys.indexes i on s.object_id = i.object_id and i.index_id = s.index_id
join sys.partitions p on s.object_id = p.object_id and p.index_id= s.index_id
where objectproperty(s.object_id,'IsUserTable') = 1 and s.index_id> 0 and s.database_id = @dbid
group by object_name(s.object_id), i.name, i.index_id, user_seeks+ user_scans + user_lookups, user_updates
order by reads, writes desc
You can most likely drop any indexes with zero or close to zero reads. The more rows, the more space you will reclaim. The more writes, the better write performance you get by removing them. If the index is not being read but writes are minimal and rows are low, there is little benefit of dropping it. Keep in mind that not having an index that you need is a lot worse than having an index you don’t need in most situations. I suggest starting off with a conservative approach.
Now that we have gotten rid of the low hanging fruit, we can get deeper. I get a list of tables by size and work my way down.
select object_name(object_id), max(rows)
from sys.partitions
group by object_name(object_id)
After running that query we see that tEvent is the largest table and should provide nice gains. This is where we start. Let’s use this schema for our example.
create table tEvent
(EventID int primary key clustered,
EventType int,
EventName varchar(100),
EventDetailID int,
CustomerID int,
CompanyID int,
DateOpen datetime,
DateClose datetime)
create index ind1 on tEvents(EventDetailID);
create index ind2 on tEvents(CustomerID);
create index ind3 on tEvents(EventDetailID, CustomerID, CompanyID);
create index ind4 on tEvents(CompanyID, DateClose, EventType);
create index ind5 on tEvents(CustomerID, DateClose, EventType);
Removing used but redundant indexes
If we do a sp_helpindex, we can see that ind3 should satisfy queries currently using ind1 and ind2. The keyword is “should”. It’s a fairly safe bet so we drop ind1 and ind2. Now we should monitor the missing index DMV to see if there is any negative impact. We should also see the read count substantially increase by using the unused index query filtered by tEvent. Here is the missing index monitoring query:
--Missing indexes
INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
INNER JOIN sys.objects WITH (nolock) ON mid.object_id = sys.objects.object_id
WHERE (migs.group_handleIN
SELECT TOP (5000) group_handle
ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC)
)
and objectproperty(sys.objects.object_id, 'isusertable')=1and name = 'tEvent'
ORDER BY 2 DESC , 3 desc
As long as the indexes do not come back up in this report, we should be ok. Now we are servicing the same queries with 2 less indexes. Those are index pages that are no longer taking up buffer pool space!
Combining indexes
The same concept can be applied to the following scenario but a little more “feel” and understanding of how the app accesses the data is needed.
We know a former DBA added these. It is logical for us, based on our knowledge of the app and the cardinality of the data, to try to replace these indexes with this one.
create index ind6 on tEvents(CustomerID, CompanyID, DateClose, EventType) with (online=on, maxdop=8);
This index should satisfy all queries using both indexes. Again, we want go back to the missing index report to see if SQL thinks it needs one of those indexes.
Now work your way down your list of tables by row count. I was able to reduce the size of a 200GB database by 15% in addition to removing the unused indexes. That’s a big gain! The only problem I ran into was changing the name of an index that had a hint. There were a few indexes that I had to add back but since my approach was conservative it was nothing drastic and completely online.
Let me know if you have any other tips or questions regarding this topic.
Here are the demo files from my presentation at the North Texas SQL Server Users Group. It was a pretty fun experience. I was kind of hestitant to go with the topic because I thought it may be a little dry however I think it turned out pretty interesting.
Maybe I am a little biased but I think North Texas sets the bar high when it comes to knowledgable DBA's. We have a lot of of Microsoft people, MVP's and authors to spread the knowledge around. It was an honor to present and I look forward to attending the user group more often.
Download demo scripts.
Itzik Ben-Gan describes a method of analyzing trace files in his book "Inside Microsoft SQL Server 2005: T-SQL Querying" He provides a couple functions for stripping the literals from queries in the TextData column. If you do not use one of these functions or one of your own similar tools, you are not being very effective in your trace analyisis. The first one is from Microsoft PSS and can be downloaded here. The second one is a CLR function that uses regex. It performs much better. His book is worth it just for this function.
Here is the concept: Tuning single queries can be insane ineffective for multiple reasons. When facing a performance problem, you should be looking at query patterns that way to can get the biggest bang in the shortest amount of time.
Here is a simple trace analysis query using the PSS function.
-- Generate pattern and order by highest CPU
-- This is long running so you may want to dump it into a table.
--http://statisticsio.com/files/patterns/fn_tsqlsig.sql
SELECT dbo.fn_SQLSigTSQL(textdata, 4000)
, SUM(CPU)
, COUNT(*)
FROM dbo.trace_table
GROUP BY dbo.fn_SQLSigTSQL(textdata, 4000)
order by CPU DESC
Check out Itzak's book for more info on trace analysis. In this series on blog's, we will look at applying this concept to data in the DMV.