This site is maintained by Jason Massie. He has 10 years experience as a DBA and has specialized in performance tuning for the last five. He was recognized by Microsoft as a SQL Server MVP. Jason has spoken at the Professional Association of SQL Server Conference, the North Texas SQL Server Users Group, SQL Connections and TechED. He has worked at Terremark (formerly Data Return) for a decade.
You can contact him at jason@statisticsio.com , MSN IM jason_massie@hotmail.com or 469.569.5965
Jason has the following certifications:
Abstracts addition Affinity Aggregation allocation Always Analysis Announced another API Appending article Authentication backup be Behavior between Bootstrapper Breaking Build Cache Caching Check checksums Codeplex collection Connecting contest Controller Creating CTEs CTP CUBE cursors Data Database DATALENGTH Debugging Design Diagnosing Diagnostic Differences Documentation DTS Emergency enhancement Entity ETW Exchange execution Express Extensions Fall February Filestream Filtered group GROUPING have Hosting Idle impact Improvement Increase Index Indexes Inserts Instances Interoperability Introduction IO large Late LOB local Localized Magazine Maintaining Maintenance Management maps March Microsoft minutes missing Mix Never November Offline OLE Online operations operators optimizations Optimized Overlapping Package Page Paging Panacea parallel part Partial Partition partitioned Partitioning PASS Performance PFS plan Plans Practices problem Problems Procedure Program programmatically Programming Protection Queries query read recent Recursive Related released Reports Restore return ROLLUP ROWCOUNT Runtime Security Select Sequence sequential Server Services set SETS Shooting shorts sizes Solutions Sortable SPARSE Spool SQL SQLIOSim SSIS Stalled Star Statement Statements stats Stored strategy Stuck Studio Submission Subreports Suggested Summarizing system Table Tables Tampa Task Than there through Timeouts Total Traces Transaction transfer Tricks Trouble TSQL turning understand Understanding undocumented Unique unused upgrade Upgrading Useful Value variables VDI Vista Will Windows Wireless
SQL Server News & Information tsql, performance tuning, industry trends, & bad jokes
tsql, performance tuning, industry trends, & bad jokes
I have ported my blog to wordpress. It is simply the best blogging platform there is. This site will stay up for historical purposed. but all new content can be found here starting today. http://jasonmassie.com
If you are an RSS reader, no need to do anything.
Tim Ford(WWW, Twitter) tagged me. This meme is all about what you would do if you were trapped on a deserted island with nothing but a laptop and aircard. Here is an excerpt:
So You’re On A Deserted Island With WiFi and you’re still on the clock at work. Okay, so not a very good situational exercise here, but let’s roll with it; we’ll call it a virtual deserted island. Perhaps what I should simply ask is if you had a month without any walk-up work, no projects due, no performance issues that require you to devote time from anything other than a wishlist of items you’ve been wanting to get accomplished at work but keep getting pulled away from I ask this question: what would be the top items that would get your attention?
Well, first I would have to set camp, kill The Others, and salvage the wreckage of Oceanic Flight 815. But after I would ferment coconut juice so I could have drinks on the beach. Priorities, right?
So after a couple of months of drinking, surfing(waves and pr0n), fishing, and basically being an all around beachbum I would do this:
Study up for the Oracle Certified Professional exams.
I should be doing this now but I have trouble finding time and motivation. I have the books, study material, and an Oracle lab environment in the cloud. I have been pondering why I find it easier to put my head down and learn the new versions of SQL Server but procrastinate on Oracle. I have come to the conclusion that it is the fact that I do not use books with SQL Server. I usually learn all of the new engine feature before any books are written. I guess I just need to get more hands on with Oracle.
Design a set of default policies for new SQL Server 2008 installations.
In SQL Server 2008, we have policy based management. I want a set of default policies that guide people into best practices. For example, preventing a new database from being created with 1MB autogrow. The policies should mostly cover server and database settings. Schema oriented policies could vary between dev teams and should be handled by them.
Powershell as my default administration method.
I got into a powershell kick for a while but I lost interest. I need to give up the whatever GUI and do all administration through powershell. That is how I learned TSQL. Today, I use the GUI when it is a couple of clicks versus lines of TSQL but I could do it if I needed to. If I give up all MMC snapins and other GUI’s, I might spend a little extra time with powershell at first but it should pay big dividends in the long run.
The Others:
Brent Ozar(Twitter)
Grant Fritchey(Twitter)
Thomas LaRock(Twitter)
I am tagging the following:
SQLChicken
SQLFool
Jeremiah Peschka
I am presenting that June DBA SIG. This will be the topic:
Highly Available SQL Server Upgrade Assault Tactics
In this debriefing, we will look at various methods to minimize downtime during major version upgrades. This will include upgrading from SQL Server 2000 to SQL Server 2005 and SQL 2008 as well as SQL Server 2005 to SQL Server 2008. We will cover key planning and testing skills that can cut substantial time off the upgrade and minimize problems after the upgrade. We will also cover the technologies that you can add to your arsenal when planning an HA upgrade assault. All hands on deck as we do battle against upgrade downtime.
The meeting will be June 24th at noon Eastern. More info can be found here soon. However, set an outlook reminder now. Is it set? No? Well, set it now!
I wrote about the Nehalems a while back. I was about to write about AMD’s come back but they are due in 2010 while Intel will one ups them again before the end of this year. This is after, it is rumored, that they are delaying the Nehalem-EX just because they can. How is that for innovation? However, what are you going to do? An eight core four socket box with DDR3, and 96MB of cache will smoke an eight socket AMD HP dl785 box. Things will get fun in the database area.
The Nehalem-EX Advantage
I didn’t say it. :) However, I must say it is interesting prose. You can visit the site here and read up on her, Visual Studio Guy, Windows Master and the other members of “The Source Fource”. Might I add that Capt. Varchar made it to 2nd base but she won’t call him back.
Oh well, it is all her loss.
A keen motorbike enthusiast, SQL Server Gal spends her free time and energy looking after her beloved Harley Davidson, called Data Drive. She spends Sunday afternoons challenging the speed barrier and enjoying the natural beauty of the countryside. But with her feisty, alternative take on the world, SQL Server Gal is a biker chick with a difference. Her vast knowledge and language skills mean that she has an answer to every question. And despite her glitzy life as a member of the Source Fource, she still uses her extraordinary memory skills to help others realize their ambitions.
Read more.
Control Z
This post is inspired by the #famouslastwords thread on twitter today. If you are doing a major, minor or any change on a production system, have a rollback plan. This will probably go beyond restoring from backup especially if the data is large. From someone who has put in 24 hour+ shifts, plan and test as much as possible.
I have long hated maintenance plans. First was because of the cryptic error messages when they fail. However, DBAs before me clued me into the fact they clean up files whether or not they make it to tape. The cleanup process is oblivious to the tape backup. This is really important. It can be catastrophic for this to happen. How many days do you keep on disk? One, two, three, seven days? Do you manage the tape backup as well? What if you have to restore from tape longer than that but the windows\backup admin was out on vacation while the tape backup was failing? I submit to you that it is better to fill up the backup and log drives before you delete non-archived backups.
There are several ways you can guarantee backups make it to tape before you delete them. First, you could check the archive bit.
For example:
del /a-a *.trn *.bak *.dff
Of course, you may wish to do it from powershell, vbscript or xp_cmdshell so you can only delete files older than a certain date. You may want additional logic if you need to keep a weekly full, a nightly diff and 24 hours of tlogs ON DISK.
Most backup software also allows you to run a post job script where you could clean up backups. I can think of horrific scenarios where it would still delete the “money bags” so I would stick with the windows file system attribute. If there is a bug there, it will be SEV A and lot of people will run into it.
I have now been on both ends of the process. Submitting abstracts for conferences and selecting them in this year’s SQL PASS process.
First, I would like to applaud PASS for taking a huge step forward in two areas. First is marketing. They have grow the submissions, sessions and attendees at an exponential pace. I guess they will beat TECHED this year. They also introduced a social element to the process. Once you submitted an abstract, it was open to others to view. This creates a crowdsourcing element. It makes the community to step it up a notch. The result is a better conference for the participants due to competition.
Here are a few tips from my experience on both sides of the isle.
Anyway, I would be happy to review your abstract in the future and provide constructive criticism. Feel free to drop me a note.
I am not NULL!
Adapted from OfficeOFFline.
I have been pinned down with employee reviews and SQL PASS abstracts. On top of that, I have been learning Linux and Oracle instead of working with SQL. It is fun. I am learning 30 things a day but it is either level 100 or 500 for this crowd. I will put out a Oracle cheat sheet for SQL DBA’s soon though.
You can peep my life stream if you want to keep up.
Until then I will keep failing…
It means I have been getting my a$$ handed to me at work. It is a good thing. I like action but I like blogging too. Anyway I have a few topics queued up but until then I will post a few links.
The company I work for was awarded by VMWare as Service Provider of the Year.
SQLChicken(WWW, RSS, Twitter) posted on ESX 4.0 aka vSphere from a DBA’s perspective.
The MySQL founder talks about the Oracle\SUN acquisition.
Tony Bain’s Twitter feed has some interesting links\commentary regarding Oracle and MySQL.
There is also an article on the Oracle\SUN deal means to MSFT. I don’t agree 100% though.
Last week, I wrote an editorial that I tried to keep fact based and my opinion out of. Wait, is that still an editorial? Anyway, I was happy to see this post by Bill Graziano(RSS, Twitter) today. BTW, have you subscribed to the SQLPASS blog?
I was concerned that the BI track would be taking away from the other tracks even though it didn’t look to be justified based on the survey. My concerns have been alleviated.
Here are a few quotes:
Again the full post can be found here.
Have you had to connect to SQL Server in single user mode but the application always beats you to it. No matter how fast you try! I have actually had to unplug nics and have smart hands do it. Well, there is a soon to be documented extension of –m.
Basically, you can specify an application like -m"sqlcmd". This means only a single instance of the SQLCMD application can connect. Just start SQL with -m"sqlcmd", unbreak what you just broke and the restart SQL normally. :) Nice, huh?
Keep in mind that there are ways around this so don’t use it for security.
Here is the top 10 via Google searches in the last month:
cxpacket
94
70-432
35
"sql server 2005" + fix + "indexed view"
27
sql quiz
26
"sql server 2005" cpu wmi
22
statisticsio
18
jason massie
14
cool people on twitter
13
cxpacket sql server 2005
exam 70-432
12
Here are some more interest hits:
coolest people to follow on twitterbacon bitsandbytes"linked server" & "tsql" & "1000 rows"how much does twitter spend on servergodaddy sql will not worksql funny statementjason massie know nowwtf is sql server
coolest people to follow on twitter
bacon bitsandbytes
"linked server" & "tsql" & "1000 rows"
how much does twitter spend on server
godaddy sql will not work
sql funny statement
jason massie know now
wtf is sql server
I will be speaking at the Greater Fort Worth SQL Server Users group this Wednesday. The talk will be on Troubleshooting with the DMV’s. If you are a DFW SQL’er come on out. Directions and more info can be found here.
If you don’t come, I am sending my boy after you with blow torch and a pair of pliers.
Paul Randal has a survey on his blog that I would be interested in seeing as largest as sample as possible. I think this is important so I am going to post a link to you. Your vote counts. Yes, we can!
http://www.sqlskills.com/BLOGS/PAUL/post/Weekly-survey-does-size-really-matter-or-is-it-what-you-do-with-it.aspx
I come from the school of thought that unless you have an enterprise SAN that exceeds your IO requirements or you are intimately familiar with the IO patterns of your app, that you should use one disk group for burstable performance and possibly a second disk group for backups.
You might be a geek if…
The stole this from the #youmightbeageek conversation going around twitter yesterday. Funny stuff. The original post is here.
I hope I do not offend my BI brethren. I am just stating the facts from the survey and the fact that that the abstract submission deadline has been extended due to fact that the BI conference has been canceled.
My request is that the number of sessions per track reflect the survey results. I hope that the BI track is not inflated to the point that it cuts out sessions from other tracks.
Here are some fact from the survey which can be found here.
Based on the survey DBA and Database Dev tracks should have the most sessions. The Professional Development and BI tracks should have close to an equal number. In the past, it seems like the BI sessions doubled or tripled the Professional Development session.
I am trying to keep opinion out of this post so I leave it an this. Am I misinterpreting the numbers? What are your thoughts?
Disclosure: I am team lead on the DBA abstract selection team and I have submitting a single ProDev abstract.
In part 1, I talk about what I consider feeble attempts at implementing a reporting server through log shipping, mirroring\snapshots and, to a lesser extent, replication. Unless you invest in a real ETL solution, I argued that it is better to run a mixed workload. I talked about the architectural advantages of running mixed workloads in part 2. In a nutshell, doubling the hardware and cutting the data in half.
In this final post, we will talk about new features in SQL Server 2008 and some features that have been around a while that can help with mixed workloads.There are also some bad practices that could be the right answer that we won’t talk about but let us mention triggers, table valued functions, 20 table outer joins, some correlated subqueries and table variables. These are options but usually not good ones. In the right circumstances, they could be right like an end of year report.
Here are the main tools in your arsenal:
Resource Governor
Chances are you do not want to limit throughput of your OLTP queries. The resource governor does not do a good job with these queries anyway because their duration is usually so short. However, let’s say you have reports that run by executives. You can put them in a workload group that gives them as much resources as possible without affecting OLTP traffic. You may also have a less important group of reports from the marketing or sales teams that you can limit further. One caveat to the resource governor is it cannot limit, disk IO so if that is your bottleneck, this will not help much.
Covering Filtered indexes
Filtered indexes are a great new feature in SQL Server 2008. When optimizing for reporting queries on your OLTP system, you are probably going to be touching a lot of rows so covering the query is important. For example, the order fulfillment team works off a report of unfulfilled orders that pulls in order data, customer data, shipping data etc. In this case, you would add covering filtering indexes on each of those tables. The filtered indexes reduce write overhead on your OLTP writes and reduces read overhead of your reporting.
Indexed Views
Indexed views take filtered indexed view a step further. You can create indexes on multiple tables. Think of it as denormalization alongside your OLTP optimized schema. In the previous order fulfillment example, we can basically persist that report and have it updated in real time. There is more overhead to your OLTP transactions so weigh the pro’s and con’s. Test if possible.Unfortunately, you cannot defer changes to your indexed views but I believe there is a feature request for this on Connect and I will tell you about a workaround shortly.
Partitioning and Compression
This is the dynamic duo when mixing workloads. Unfortunately, the nitty gritty details would require their own post. For example, one mixed workload may benefit from compression on the hottest partition while the older data should be uncompressed. However, another workload may benefit from the opposite. The key here is really understanding your workload, data and hardware limitations. Most importantly, plan then TEST, TEST, TEST! Once you partition, you loose online operations so if you do it wrong, you are stuck.
Persisted Computed Columns
This is an easy one. It is a simple trade off. Writes take a little more CPU and space in exchange for reduced CPU time when you report. Take your orders table, for example. You could calculate and save shipping costs when you insert the rows. If it adds a few milliseconds to the insert but shaves seconds off the hourly open orders report that the execs are looking at, it may be an easy decision.
Archival
This might not always be possible depending on your data. It may not be necessary if you have finely tuned indexes. However, it could make a night and day difference. If you need the data, UNION ALL’ing the production table with the archive table has little overhead. I do suggest you keep the archive database on the same server unless it will rarely be accessed. Trying to do this with linked servers is bad.
After hours denormalization
This is basically precreating reports during off hours. Think of it as indexed view with deferred updates. You can UNION with the OLTP tables if you need realtime data in your report. In an ideal world, touching less rows in the OLTP table and then UNIONing with the denormalized data will result in the best of both worlds if you need real time data.
The final word
As the concurrency and size of data scales, both a pseudo reporting database and a mixed work load scenario will not meet business requirements. A business requirement of real time data may dictate a mixed workload. There may be plenty of workloads where scaling out and scaling up both meet performance demands. I just wanted to play devil’s advocate and let you know there is another option when planning reporting.
Intel released the Nehalem processor family. The Intel® Xeon® Processor 5500 series family. This could be the nail in the coffin for AMD. I hope not. Without competition, Intel can rest on its laurels.
Lets look at the goodies. On Glenn Berry’s blog, he points out the SQL specific benchmarks. This comes from the Anandtech benchmarks. This is what caught my eye from that review.
The memory controller has up to three channels. A dual CPU configuration has access to 35GB/s of memory bandwidth (measured with stream) if you use DDR3-1333. The latest dual Opteron achieves 19.4GB/s with DDR2-800
Think about it. If you have a SQL box with 32GB of RAM and a VLDB, you could theoretically churn the buffer pool once every second. Of course, you will probably hit a disk bottleneck first. In addition the the proc specific improvements, DDR3 with NUMA support in a server is a huge leap.
Today, HP also introduced the DL3X0 G6. Here is a link to the the DL 360 G6 specs. I am speculating but I bet it would beat a dual socket 6 core DL 580 G5. Most definitely on IO bound workloads like a database server. Hopefully, the DL580 G6’s are coming soon. Maybe an 8 socket DL 780 G6. :)
Mix that with VMWare ESX 4.0 that is in RC and virtualization of the database server may have come of age.
The Opteron 1up’ed Intel in 2004. Now the ball is back in AMD’s court. I am rooting for you!
The T’aint
If you don’t get it, bless your heart. If you do, don’t cry foul because you do not learn this phrase in Sunday School. Besides, it is ALWAYS the network!
I am presenting at the Ft. Worth SQL Server Users Group in April. Details forthcoming but it is basically going to be on running mixed workloads(OLTP and DSS) on the same server. This is part two of what I will be pulling into the presentation. Click here for part one.
In this part we will look at the typical hardware configurations used in part one(logshipping, mirroring, replication etc.). and what a better configuration would be especially if you leverage some new features in SQL Server 2008 to run mixed workloads.
Unless you invest the time to create a real reporting solution with an ETL, you end up with the following solutions based off of the methods described in part one.I see it all the time. On top of that, I see reporting queries still running on the production server because there is a need for real time data.
So what do we get with this solution? Most reporting queries are offloaded from production. However, there are a lot of con’s. The schema usually is not optimized for reporting. There is overhead in getting the data to the reporting server. The data is stored twice on disk and more importantly, memory. Finally, resource utilization is usually lopsided. For example, first thing in the morning the reporting server may be hammered while production traffic is just ramping up. During peak production traffic, the reporting server can be underutilized.
I submit to you that combing reporting and production is a better configuration if you do not invest in an ETL solution that creates a real reporting database. In the next post, we will talk about features to optimize this configuration but lets talk about what we gain just by using this architecture.
Note: I use direct attached storage(DAS) in these examples because that is where the biggest gains are to be had. However, the same benefits apply if you are on an enterprise level SAN with some caveats.
Note 2: This series is generalized and your mileage may vary based on your particular environment, business requirements and workload.
In part three, we will talk about features that will help optimize a mixed workload on a single instance with feature in SQL Server 2008.
I am presenting at the Ft. Worth SQL Server Users Group in April. Details forthcoming but it is basically going to be on running mixed workloads(OLTP and DSS) on the same server. This is part one of what I will be pulling into the presentation.
What is NOT a Reporting Server
Log Shipping
A log shipped copy is not a reporting database. It is the same database that should be optimized for OLTP. You have no control to add supporting indexes. No denormalization. No persisted computed columns. No indexed views. Disconnects can happen midquery. More hardware. However, this is often the easiest solution,
Database Mirroring with Snapshots
This configuration suffers all the limitations of logshipping. However, you must run Enterprise Edition. You can get around the disconnects with creative coding.
Nightly Backups\Restore
Just like log shipping but the data is behind which may be ok based on business requirements. You can get around the limitations of logshiping like indexing etc. However, not practical for VLDB.
Snapshot Replication
This is ok for smaller databases plus you can filter tables and columns if they are not needed. You can get around some of the limitations of log shipping and mirroring but data is stale.
Others
Offline the database, robocopy, attach. DTS\SSIS the whole db. SAN Replication. Transactions replication with no reporting modifications.
The problem
You double your hardware and storage with no real reporting gains in most scenarios. This might be acceptable if the reporting environment duals as DR. However, there are better solutions.
What is next?
Moving forward, we will talk about doubling the hardware on OTLP and using SQL 2008 feature to run reporting and OLTP on the same server.
*Warning* Only use if your array controller has a battery backed cache. *Warning*
The settings are “Enable write caching on the disk” and “Enable advanced performance”. You can access these through device manager on the properties of the disk. These settings mostly apply to direct attached storage and are unavailable for most enterprise SAN lun’s that I have seen.
While we are at it, if your RAID controller cache has a read\write ratio, it is a good idea to set it to 0% read\100% write as long as you do not have a memory bottleneck. SQL uses RAM as its read buffer.
So is this a silver bullet for performance? Definitely not especially if you are not hitting a disk write bottleneck. However, every little bit helps and if it knocks 5-10% off of your 3 hour long full backup to disk, that is a win!
Happy Friday!
The default setting is the wrong setting for SQL Server. However, unless this has caused you a problem or you are thorough to point of OCD, this may not be set on your server.
Unless you are fighting a memory bottleneck, it probably won’t affect you too much but it is hard to give SQL too much memory.
The setting is “Maximize Data Throughput for Network Applications” and on by default. It sounds like a good thing. To the contrary, here is documentation from MSDN.
http://msdn.microsoft.com/en-us/library/ms178067.aspx
Maximize Data Throughput for Network ApplicationsTo optimize system memory use for SQL Server, you should limit the amount of memory that is used by the system for file caching. To limit the file system cache, make sure that Maximize data throughput for file sharing is not selected. You can specify the smallest file system cache by selecting Minimize memory used or Balance.To check the current setting on your operating system1. Click Start, then click Control Panel, double-click Network Connections, and then double-click Local Area Connection.2. On the General tab, click Properties, select File and Printer Sharing Microsoft Networks, and then click Properties.3. If Maximize data throughput for network applications is selected, choose any other option, click OK, and then close the rest of the dialog boxes.
Maximize Data Throughput for Network Applications
To optimize system memory use for SQL Server, you should limit the amount of memory that is used by the system for file caching. To limit the file system cache, make sure that Maximize data throughput for file sharing is not selected. You can specify the smallest file system cache by selecting Minimize memory used or Balance.
To check the current setting on your operating system
1. Click Start, then click Control Panel, double-click Network Connections, and then double-click Local Area Connection.
2. On the General tab, click Properties, select File and Printer Sharing Microsoft Networks, and then click Properties.
3. If Maximize data throughput for network applications is selected, choose any other option, click OK, and then close the rest of the dialog boxes.
Happy Tweaking.
Some of these are confusing so I thought I would write a blog on it. The post only refers to the SQLOLEDB and SQLNCLI providers.
Here are your options:
So In layman’s, let’s talk about each:
Dynamic Parameters: If you have a lot of adhoc queries against linked servers this may be a good option to turn on along with forced parameterization. Search your proc cache for queries containing the linked server name with single use counts.
Nested Queries: I would say that is depended on the business requirements.
Level Zero Only: Leave default for SQL Server
Allow InProcess: This is on by default with the native client provider but off by default with the SQLOLEDB provider. I might try changing to in process with OLEDB especially if context switches were high and % kernel time was high.
Non Transacted Updates: I would really use caution with this. It is like a NOLOCK hint. It may be ok for the app but unintended consequences could happen. Besides, maintaining the transaction is probably a small part of the duration when you are hitting the network.
IndexAsAccessPath: I make the mistake of turning this on once. It sounds like a good thing. To quote MSDN, “If True, the OLE DB provider indexes are used to fetch data. If False (default), the SQL Server indexes are used to fetch data.” Leave it false!
Disable Adhoc Access: This depends. I would normal setup a linked server for administrative tasks and a lot of work would be adhoc but the linked server would be locked down.
Supports “Like” operator: This also depends on business requirements but you are probably going to get crappy performance.'
Hold up cowboy
I would not run out and start changing stuff. Number one, avoid distributed queries in OLTP apps to begin with. Number Two, make one change at a time. Number three, if you change something and performance degrades you will probably see remote scan vs. remote queries in query plans.
The high end applications for mission critical business are mostly powered by EMC. Other companies like 3PAR, NetApp and HP are biting into their market dominance. However, EMC has a bigger problem looming.
I have written about Solid State Drives(SSD) before but the Intel business class drives are the only ones that come close to being production ready due to random write speed. The other problems of price and capacity keep them out of reach for most companies. On the consumer side, the high end Seagate conventional drives beat most of the SSD drives on write performance.
The stepping stone
Tiered storage will probably be how SSD make their first appearance in production apps. Maybe SATA shelves for backups, SCSI or fiber channel drives for 80-90% of the data and SSD for the really hot data.
The game changer
FusionIO’s latest press release changed all of that. Imagine putting the capacity and performance of those 5 cabinets shown above in a shelf with 1% of the power requirements. Note: The TB+ capacity cards are coming Q2 2009. They currently support up 640GB. Here are some numbers on a 4 card setup:
Performance for multiple ioDrive Duos scales linearly, allowing any enterprise to scale performance to six gigabytes per-second (Gbytes/sec) of read bandwidth and over 500,000 read IOPS by using just four ioDrive Duos.
Here are single card numbers:
• Sustained read bandwidth: 1500 MB/sec (32k packet size) • Sustained write bandwidth: 1400 MB/sec (32k packet size) • Read IOPS: 186,000 (4k packet size) • Write IOPS: 167,000 (4k packet size) • Latency < 50 µsec
What is missing?
A chassis. Sure, you could put 4 or 6 of these in a server depending on how many slots you have but you are going to saturate the PCI bus before they are fully utilized. Before that happens, you will probably hit a CPU or memory bottleneck. The next logical step is a shelf that could handle 14-20 of these cards with plenty of fiber channel ports to hook up at least several servers. Will an 8GB fabric be fast enough? Until something like this happens, tiered storage is probably the way to go. I am sure they know this and are working on it. After all, The Woz joined the company.
Conclusion
I wish I could buy stock in this company.
SDS news has been flying all around the intertubes. I am going to try to summarize the links in this post. There is official “[This document supports a preliminary release of a software product that may be changed substantially prior to final commercial release. This document is provided for informational purposes only.]” on MSDN here.
Simon Sabin talks about it here.
The SDS team posts Q and A.
Oh nevermind, just go here. Mike Amundsen has links to everything SDS in his feed.
Quick and to the point. This is a satirical and risqué blog on technology\gadget\microsoft\RDMS. Heck, anything is fair game. Some have already been offended but the great thing about the internet is you can click “back” and never return. To quote Brent, “Some people like Bill Cosby, some prefer Eddie Murphy, and a few like both.” You like it or don’t, we are just going to do our own thing.
Remember everything we post is NSFW in some corporate environment but a lot of it will fly in laid back shops. All posts are meant to be humorous though they may be crude or juvenile. Some funnies just out right fail.
The chefs include:
Mystery man extraordinaire: Wilbur Applewood, aka BaconBitsNBytes on Twitter
Brent Ozar, aka BrentO on Twitter
Jason Massie, aka StatisticsIO on Twitter
Tim Ford, aka SQLAgentMan on Twitter
Tom LaRock, aka SQLBatman on Twitter
Brent and SQLBatman both have posted better run downs.
So if you are feeling adventurous, check it out(Post RSS, Comment RSS, Subscribe by email, Twitter).
Technical post on SQL Server and related topics will resume here.
SQL Data Services will be a relational database that has most of the features of SQL Server.
To quote David Campbell:
Tables?...CheckStored Procedures?...CheckTriggers?...CheckViews?...CheckIndexes?...CheckVisual Studio Compatibility?...CheckADO.Net Compatibility?...CheckODBC Compatibility?...Check
Tables?...Check
Stored Procedures?...Check
Triggers?...Check
Views?...Check
Indexes?...Check
Visual Studio Compatibility?...Check
ADO.Net Compatibility?...Check
ODBC Compatibility?...Check
Read about it here.
Up late making some changes on a large SQL 2000 publication and hoping like hell that I do not have to push the full snapshot.
Anyway, just a link post but Dave Robinson dropped a real teaser on the forthcoming “rewrite” of SDS here. As always, Mary Jo Foley has the rumors. Pay attention to Mix which is #mix09 on twitter.
What happens in Seattle doesn’t stay in Seattle. I could not take pictures during the day because <nda> and <nda> cannot be announced yet. However, database diapers is fair game.
Anyway, here is part 1. Video next.
Click.
Here is the link the the original post if it doesn’t syndicate right.
I finally signed up for xbox live. My gamer tag is Captain Varchar. Right now, I am just playing Left 4 Dead. However, it is hard as hell but let's kills some zombies! Next week, it is will be Halo Wars after I get back from the MVP Summit.
Captain Varchar
Here is a way off topic post. The music in this mix of house and techno is definately an aquired taste.
Ninja Fog
The cloud has been all the buzz this week. Paul makes a prediction, Denis talks about the challenges, Steve chimes in and the register reports that Microsoft plans to release a full featured SQL Server to cloud. I have talked about this before but I got nothing but ninja fog today.
This comic was adapted from OfficeOFFline.
I have been mining publicly available numbers out of Google reader for the last couple weeks for frequent published sql blogs. I am going to present the numbers but let me make some points.
Wait stats analysis is a great skill to have in your arsenal. There are lots of tools out there already. I also suggest you read Tom Davidson's whitepaper on it if you need background on this topic. There was a niche missing in my toolbox. You see, these stats are cumulative since the instance has started or you clear with DBCC SQLPERF(WAITSTATS, CLEAR).
Whatever sys.dm_os_wait_stats reports as the highest waiter may have happened last week. If you want to know what is hammering the server now, you can look at several different DMV's like sys.dm_exec_requests but that data is transient. My rough code (all my code is rough) below will tell you what the server has been waiting for in the last second of time. It is definitely a firefighting "WTF is going on right now" query. It would also complement this query when fighting fires.
Read the whitepaper for a better explanation but, basically, there are two buckets of waits. Signal waits which are actual waits on CPU execution time. The others are resource waits which means SQL is waiting on other stuff like a locks, latches, log writes, memory etc. Again the whitepaper does a great job correlating these sometimes cryptic names with resources.
Lastly, analyzing wait types goes hand and hand with perfmon. For example, you have both high CPU and high IO in perfmon. This script can help you see what SQL is actually waiting on the most so you can tackle that problem first.
SELECT wait_type ,signal_wait_time_ms AS 'CPU' ,wait_time_ms - signal_wait_time_ms AS 'Resource' INTO #temp FROM sys.dm_os_wait_stats WHERE wait_time_ms <> 0 WAITFOR DELAY '00:00:01' SELECT a.wait_type , signal_wait_time_ms-cpu AS CPUDiff , (wait_time_ms - signal_wait_time_ms)-[resource] AS ResourceDiff FROM sys.dm_os_wait_stats a JOIN #TEMP b ON a.wait_type=b.wait_type ORDER BY 2 DESC DROP TABLE #temp
I am just saying... I read the comments and I thought I should post. If he followed everyone on this list and this list with a real client he would see the value. NNTP = Dead except for those who do not understand search engine syntax. Blogs are not status updates any more.
So I challenge Adam and you if you are not on twitter to follow these people for a month with a real client and post your status.
Make sure you get me @statisticsio.
6 degrees of Virtualization
You might think this is funny(or not) but this will be happening sooner or later.
This past weekend I asked the community(or at least my twitter feed) what they thought about virtualized database servers. Lots of people use it in Dev\Test\QA. Some in prod. Some thought it was pointless. Before I go over my lessons learned, let me address the “pointless” point.
It could be argued that it adds another unneeded level since you could do this with instances. For that matter, you can use one instance and permissions. This is true. If you are consolidating just a few instances, the cost and overhead probably doesn’t justify virtualization. However, there are other benefits like VMotion or adding capacity horizontally. You can’t easily move two instances off an over utilized box. With ESX, you can do this online while transactions are in flight. Added redundancy is also a benefit.
So here are my lessons learned:
Beware of CPU bound workloads
Most database workloads are IO bound even if it is logical IO’s. However, if your database is really small but you do lot of complex calculations, lots of business logic or string manipulations etc, your performance may suffer more than you would expect.
Use x64
This is just a vendor recommendation that suggests a 10% gain.
Don’t trust the CPU counters
Another thing I learned from the whitepaper. The CPU ready counter in Virtual Center is very useful.
Set a memory reservation
I would at least reserve half of the memory given to the guest especially if the host is busy. ESX has a balloon driver that will take memory from guests if it needs it and it thinks the guest is idle. This usually is not good for database servers.
The Microsoft Windows Server 2003 Scalable Networking Pack is evil
This is enabled by default with Windows Server 2003 sp2 and it doesn’t play well with SQL Server, VM’s and especially SQL Server on VM’s. See this post.
You have to sector align TWICE
First you have to sector align the vmfs and then at the OS level. Here is VMWare recommendations. I agree with Kendal’s recent finding of 128k offsets and 64k NT allocation unit based on my own testing.
If you have a lot of SQL VM’s on a host, see if EE makes sense
With Enterprise Edition, you only have to license the host. If you use standard edition, you must license each guest. If no EE features are needed, “lock pages in memory” alone, may be enough. In addition to other problems it solves, you can enable large page support.
Bad code usually runs worse on a VM
Man, the weather has been nice. How about them cowboys!
Anyway, do you have any lessons learned running SQL on VMWare..
I asked 1200 tweeple if and why they are still running SQL Server 2000. The results are interesting. It appear vendors *are* evil.
statisticsio: How many people are still running SQL2000? Why? When are you going to SQL2008? If not, why? No comments from the Oracle\MySQL pnut gallery.
Trying out a new tool from Chris Pirillo called Twickie. It basically lets you blog one of your tweets and all of the replies. It missed about half of the replies but it is a nice idea and hopefully it gets better.
I posed this question to my Twitter followers on a late Saturday afternoon. I was happy with the results. Keep in mind that most of my followers are Microsoft centric DBAs, developers, admins and general geeks. I will try this again with a different question Monday when my tweeps are paying more attention. Only the geekiest of geeks are perusing Twitter(or blogging) on Valentines' Day. See below ;)
Credit to Scoble for the idea.
I will post tips and tricks I have have learned running SQL Server on ESX tomorrow.
statisticsio: Hey you. Yes, you. Are you virtualizing your db server? In prod? How big is it? Pitfalls? Winfalls? ROI? Performance? Pro's\Con's?
edq @statisticsio we have several prod vms all have few simultaneous users LT 40ish so far so good mirroring on two vms difrnt farms works good2 Sat, Feb 14 17:40:00 from TwitterBerry
UndertheFold @statisticsio just inherited 72 servers almost all VM on ESX clusters including DW and some VLBs, seems to be working ok a lot are sql 2000 Sat, Feb 14 17:15:22 from TwitterBerry
mike_walsh @statisticsio biggest db is smaller, 75gb. Have worked with 150gb on windows virtual server (not hyper v) i/o reaked. Frustrations. Sat, Feb 14 16:29:28 from TwitterBerry
mike_walsh @statisticsio roi-rackspace/power/happier server team.. Perf-has been alright for med and dev/qa workloads. Nothing huge on vm yet. Sat, Feb 14 16:27:49 from TwitterBerry
Jorriss @statisticsio Virtual in Dev, Test, and Stage. Prod? No way. Never, never, never. Sat, Feb 14 16:28:27 from twhirl
mrdenny @statisticsio planning to VM smaller prod database. All dev and qa are vm. No issues yet. Sat, Feb 14 16:17:05 from TinyTwitter
jlshultz @statisticsio yes we r. Not big # users, but Oracle DB & Oracle Apps. Been vm ~3yrs, its been great. Time sync been only issue. Sat, Feb 14 16:16:01 from TwitterBerry
update:
crisatunity @statisticsio the problem I see with db on vm is that all of the db products are geared in very specific architecture not to be on vm.about 2 hours ago from web in reply to statisticsio jmkehayias @statisticsio almost all of our production servers are VM based for DR scenarios. I have physical servers for dev. How backwards is thatabout 3 hours ago from TwitterGadget in reply to statisticsio Efelito @statisticsio and as @mrdenny said, all dev, qa, and uat are very happy in VMs.about 3 hours ago from web Efelito @statisticsio works well for us on instances with low memory requirements SCOM, VCenter, SharePoint, and SSRS Have more detail if you wantabout 3 hours ago from web SQLDBA @statisticsio virtualizing, yes...but not in prod, only in dev & stage. Very easy to spin up new servers for developers.about 4 hours ago from twitterrific in reply to statisticsio idolan @statisticsio Ours has been virtual about 10 mos. Suspicious there may be a performance hit but maintenance advantages are a big win.about 4 hours ago from web in reply to statisticsio
Tony Bain(RSS, Twitter) has written a very nice article comparing relational databases to “cloud” databases. Not only is it well written, it has a HUGE audience at ReadWriteWeb. Props Tony. Suggested reading for Database Professionals.
The full article can be found here. Here is one thing you should see this even if you do not follow the link.
There is a new meme started by Mike Walsh (Twitter, blog). He tagged Brent Ozar(RSS, Twitter) who tagged me amongst other.
Here is the the basis of the meme to quote Mike.
When I wrote about empirical evidence and learning through trying (instead of asking only), I got thinking about things I wish I knew when I was a Junior DBA that I know now.
So here is what I know now that I wish I knew then(and usually learned the hard way).
Microsoft Project is your friend.
I have written about this before but it is worth it to revisit. I started out making big production changes to mission critical systems with nothing but a task list in my head. I evolved to notepad and then excel. My success % improved with each jump. Now, I can floor my boss and customers with downtime estimates accurate the minute. On top of that, I can establish doable timelines and get more resources if my time line does meet expectations.
You can be your worst enemy.
Ego can make a brilliant employee a liability. It manifests is several ways( at least for me).
Life is so much better when you are modest rather than smug.
If the hole is round, a square peg may not be the best fit.
I have officially become platform agnostic. SQL Server will always be my first love and what I am best at but there are other products out there. Not that I know everything there is about SQL but I don’t learn 10 new things about it everyday like I used to. As a n00b, I learn 30 new things a day about MySQL\Oracle. There are valid reasons to go MySQL or Oracle over SQL Server. That is just the way it is. Imagine rewriting the DAL layer for Wiki or Wordpress just because you had to run it on SQL Server. If you drink that much koolaid, more power to you. I think knowing the features and limitations of other platforms helps me as a SQL Server DBA as well.
The GUI is not your friend.
I used to be an enterprise manager DBA. When I learned how to admin from TSQL, that is were the Senior DBA level skills came in. I still use the GUI if it a click or two vs. several lines of code but I know how to write it and, if need be, automate it. If you can’t, learn.
Know X as good or better than the subject matter experts
Where X is technology that interacts with the database: The OS, hardware, SAN, network, and application code. Of course, this is not always feasible. I have never jumped on a switch to prove it is not a SQL Server problem but I have gotten pretty close. Once when all fingers pointed at SQL Server, I had them check the switch for errors and sure enough the firewall was set 100/half duplex. If nothing else, learn the hardware and OS inside and out.
Next Victims
Jonathan Kehayias(RSS, Twitter)
Jason Strate(RSS, Twitter)
Rob Boek(RSS, Twitter)
Grant Fritchey(@gfritchey) posed this question to me on twitter after my posted on the key lookup threshold post.
@statisticsio Interesting post. Do you think that threshold is dependent on the data involved? Or maybe on the size of the key?8:28 AM Feb 5th from TwitterGadget in reply to statisticsio
I kind of assumed so which is why I did char(1000). However, I have just tested with char(1). The numbers are interesting. Please refer to the original post for repro code.
The threshold is crossed much earlier because the scan is smaller. This is a small table especially with CHAR(1) so take this test with a grain of salt. Run your own tests when you are working with large production data.
I would also venture to guess that as the complexity of the query increases the threshold gets lower especially on more complex queries like when grouping especially when grouping and outer joining.
A covering index is probably the best solution in most cases unless you need to touch all rows even then, it might be better,.
Gail talks about bookmark lookups…. err.. key lookups in this post. So are they good or bad? Well, like many things in SQL, it depends. The main factor is the number of rows returned. A few rows are fine but the cost rises sharply with larger result sets. There comes a point where the threshold is crossed and a scan is more efficient. This is because a scan leverages sequential IO while a lookup does random IO.
Here are the results of the code at the end of the post.
As you can see, at 250 rows, we have crossed the threshold and it is cheaper do a scan. If you are passing in a literal, the optimizer can detect this and switch to a scan. If it is a stored proc or parameterized SQL, a plan is cached the first time it is run. Problems happen when the result size greatly varies depending on the parameter. There are ways around this all with their pro’s and con’s. Here are some:
Here is the sample code that can repro these numbers on SQL Server 2008.
CREATE TABLE #temp( id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, c1 CHAR(1000) DEFAULT( 'blah'), c2 INT) INSERT INTO #temp(c2) VALUES(1) GO INSERT INTO #temp(c2) VALUES(5) GO 5INSERT INTO #temp(c2) VALUES(10) GO 10INSERT INTO #temp(c2) VALUES(25) GO 25INSERT INTO #temp(c2) VALUES(50) GO 50INSERT INTO #temp(c2) VALUES(100) GO 100INSERT INTO #temp(c2) VALUES(250) GO 250INSERT INTO #temp(c2) VALUES(500) GO 500INSERT INTO #temp(c2) VALUES(1000) GO 1000INSERT INTO #temp(c2) VALUES(1000) GO 1000CREATE INDEX ix ON #temp(c2) --The baselineSET STATISTICS io ONSELECT *FROM #temp WITH (INDEX=1) WHERE c2 = 1--1 row returnedSELECT *FROM #tempWHERE c2 = 1--5 rows returnedSELECT *FROM #tempWHERE c2 = 5--10 rows returnedSELECT *FROM #tempWHERE c2 = 10--25 rows returnedSELECT *FROM #tempWHERE c2 = 25--50 rows returnedSELECT *FROM #tempWHERE c2 = 50--100 rows returnedSELECT *FROM #tempWHERE c2 = 100--250 rows returned--Must begin using hints because the optimizer can tell that a scan is betterSELECT *FROM #temp WITH (INDEX=ix, forceseek) WHERE c2 = 250--1000 rows returned--Must begin using hints because the optimizer can tell that a scan is betterSELECT *FROM #temp WITH (INDEX=ix, forceseek) WHERE c2 = 1000
SHOW VARIABLES – This is like sp_configure or sys.configurations
SHOW STATUS – This like sys.dm_os_%. Basically, all runtime counters for the database engine components.
SHOW PROCESSLIST – This is like sp_who2
SHOW TABLE STATUS – This is like sp_help
These can be filtered like this: SHOW VARIABLES LIKE ‘%innodb%’
Those are the big ones. The full list can be found here.
The core of a relational database is normalization. The reduction of data duplication is what it is all about. Less data means less IO. SDS removes database design from the equation. This is why sparse columns and filtered indexes where implemented in SQL Server 2008. Here are a few posts on how this is handled in SDS.
The End of of JOINs?
The End of JOINs Part 2?
Yah yah, it doesn’t apply to your job, business, sector. Adam may call me out on FUD again. I hope it is true since this is hopeware so far.
However, I want to point out one company who have found a cloud non-relational database to meet their business requirements. They state their limitations and it makes sense as to why they took this route. If you are not up to speed, Simpledb is Amazon’s cloud db. Big table is Google’s cloud db. You are here because you already know about MSFT.
Glue chooses SimpleDB.
This is what we need to keep an eye on.
The Senior DBA
I am astounded by how often I see a problem that I know nothing about but fix it after finding the answer in a search. I get more high fives for being a good search engine user than a good DBA. On the flipside, I learned this by asking dumb questions and getting sent back a google search link.
The comic was adapted from OfficeOffline.
Microsoft has 10 “Express Database Schemas” that I ran across and I thought I would pass along. I am always on the look out for sample databases. Northwind will always have a special place in my heart like the girlfriend that cheated on me with my best friend. AdventureWorks is cool but the fact that every single feature in SQL Server 2008 is crammed into it makes it klunky. Since these are designed for different types of applications, it might be a good idea to drop these in your toolbox.
These are on the SQL Server Express page. Of course, there is no reason that these cannot be run on the full version of SQL.
Click here for any or all of them:
It appears they were designed by Barry Williams so credit is due where it is deserved.
About the Author
Barry Williams, founder and Principal Consultant with www.DatabaseAnswers.org, has been working with SQL Server since the mid-90's. Barry works as an independent consultant, trainer and writer and is a popular speaker on Enterprise Data Management at major Conferences. He can be reached at info@barryw.org.