In part 1, I talk about what I consider feeble attempts at implementing a reporting server through log shipping, mirroring\snapshots and, to a lesser extent, replication. Unless you invest in a real ETL solution, I argued that it is better to run a mixed workload. I talked about the architectural advantages of running mixed workloads in part 2. In a nutshell, doubling the hardware and cutting the data in half.
In this final post, we will talk about new features in SQL Server 2008 and some features that have been around a while that can help with mixed workloads.There are also some bad practices that could be the right answer that we won’t talk about but let us mention triggers, table valued functions, 20 table outer joins, some correlated subqueries and table variables. These are options but usually not good ones. In the right circumstances, they could be right like an end of year report.
Here are the main tools in your arsenal:
Resource Governor
Chances are you do not want to limit throughput of your OLTP queries. The resource governor does not do a good job with these queries anyway because their duration is usually so short. However, let’s say you have reports that run by executives. You can put them in a workload group that gives them as much resources as possible without affecting OLTP traffic. You may also have a less important group of reports from the marketing or sales teams that you can limit further. One caveat to the resource governor is it cannot limit, disk IO so if that is your bottleneck, this will not help much.
Covering Filtered indexes
Filtered indexes are a great new feature in SQL Server 2008. When optimizing for reporting queries on your OLTP system, you are probably going to be touching a lot of rows so covering the query is important. For example, the order fulfillment team works off a report of unfulfilled orders that pulls in order data, customer data, shipping data etc. In this case, you would add covering filtering indexes on each of those tables. The filtered indexes reduce write overhead on your OLTP writes and reduces read overhead of your reporting.
Indexed Views
Indexed views take filtered indexed view a step further. You can create indexes on multiple tables. Think of it as denormalization alongside your OLTP optimized schema. In the previous order fulfillment example, we can basically persist that report and have it updated in real time. There is more overhead to your OLTP transactions so weigh the pro’s and con’s. Test if possible.Unfortunately, you cannot defer changes to your indexed views but I believe there is a feature request for this on Connect and I will tell you about a workaround shortly.
Partitioning and Compression
This is the dynamic duo when mixing workloads. Unfortunately, the nitty gritty details would require their own post. For example, one mixed workload may benefit from compression on the hottest partition while the older data should be uncompressed. However, another workload may benefit from the opposite. The key here is really understanding your workload, data and hardware limitations. Most importantly, plan then TEST, TEST, TEST! Once you partition, you loose online operations so if you do it wrong, you are stuck.
Persisted Computed Columns
This is an easy one. It is a simple trade off. Writes take a little more CPU and space in exchange for reduced CPU time when you report. Take your orders table, for example. You could calculate and save shipping costs when you insert the rows. If it adds a few milliseconds to the insert but shaves seconds off the hourly open orders report that the execs are looking at, it may be an easy decision.
Archival
This might not always be possible depending on your data. It may not be necessary if you have finely tuned indexes. However, it could make a night and day difference. If you need the data, UNION ALL’ing the production table with the archive table has little overhead. I do suggest you keep the archive database on the same server unless it will rarely be accessed. Trying to do this with linked servers is bad.
After hours denormalization
This is basically precreating reports during off hours. Think of it as indexed view with deferred updates. You can UNION with the OLTP tables if you need realtime data in your report. In an ideal world, touching less rows in the OLTP table and then UNIONing with the denormalized data will result in the best of both worlds if you need real time data.
The final word
As the concurrency and size of data scales, both a pseudo reporting database and a mixed work load scenario will not meet business requirements. A business requirement of real time data may dictate a mixed workload. There may be plenty of workloads where scaling out and scaling up both meet performance demands. I just wanted to play devil’s advocate and let you know there is another option when planning reporting.
Here is a quick query I wrote today. It is the first time I had to go to this DMV so I thought I would share. It would be useful when planning for consolidation and troubleshooting a bunch of apps that have been consolidated or are hosted in a shared environment. Once you find the database, you can break it down by object and index with Tom Davidson's query.
select db_name(database_id) as dbName, count(*)*8/1024 as BufferPoolMB
from sys.dm_os_buffer_descriptors
group by db_name(database_id)
order by 2 desc
Just a quick note on this topic since it really is not a best practice and I am sure no one does it. ;)
Business requirements > best practices?Even if you have a reporting server dedicated for ...err... reports, the business might dictate real time data in those reports. That may or may no be possible on a reporting server but that is another post. Most of the time, the reporting server is not real time. At the same time, the OLTP data is not optimized reporting and if it is, transactional performance decreases. The solution usually dictates some reporting functionality being added to your production OLTP server. Maybe not at a DSS level but an adhoc query built by a web page. A company's order history, an store inventory, or even an advanced search web page. You know the one:
select * from widgets
where color is null and size is null and flavor is null........ and (title like '%ice cream%' or description like '%ice cream%' or keywords like '%ice cream%')
Is that really a reporting database?
So you have a copy of your OLTP database either through mirroring\snapshots, logshipping or replication that you call a reporting database. What does that buy you? Well, we you are offloading reporting right. Kinda. You separate your reporting queries and your OLTP queries but here are some downsides:
So what is the solution?Â
The real solution is to design an incremental ETL process that loads to a report optimized database. If a closer to real time data is required, triggers, modified replication or possibly asynchronous triggers might be the way to go. I am not going to pretend to know how to architect a solution like that. However, it is safe to say that a nightly or real time solution will require some serious dev work. This includes creating the ETL process and rewriting the application to use the report optimized database.
Paint yourself into a corner?
Say you have a home grown database that has gone from megabytes to 100's of gigabytes. Separating reporting functions to a new database is going to take time. However, SQL Server 2008 provides new features that scream consolidation. Not only instance consolidation but functionality consolidation.
The other assumption of consolidation is larger hardware. If you are not spending money on a copy of OLTP reporting server, you can get a larger OLTP box.
Warning: this is forward looking since I don't have any production SQL Server 2008 servers yet. :)
Conclusion
Not to sound like a Microsoft fan boy but SQL Server 2008 Enterprise Edition provides a lot of benefits for mixed workload boxes. However, the real point of the post is that making a copy of the OLTP database for reporting may not provide gains one would expect.
I have half written post called "The Rebirth of the DBA". I am not sure if it is going to make the cut. There is not enough time before I go on vacation and there is some good community discussion both in the comments and in the blogosphere(Brent O, Grant Fritchy, Pythian).
I will summarize really quick. I was halfway playing devil's advocate. While it is possible that they could write some really amazing software that could script the DBA, it is unlikely that they can totally do it. The DBA will still be around. The numbers may be less. Maybe substantially. The roles may change. Maybe drastically. Hell, your SQL Server 2015 VM may have a "cloud partition" in a physical table. There are other factors in the mix like consolidation, virtualization and LINQ\ORM that are also going to have to have an effect. It is all speculation but the only constant is change.
It seems like just yesterday Compaq Proliant 8500’s were the bomb with 8 – 550 mhz processors. 4.4 ghz of processing madness! That was even before hyperthreading and multi-cores. Since then, the big 3 server hardware vendors eliminated 8 way machines in their commodity server lines. You still had 8 way options but there were more cost effective configurations because multi-core processors removed the need for 8 way boxes in most cases.
That is until now as HP releases the dl785 g5. Eight sockets capable of running quad-core 2.3GHz AMD Opterons. That is a combined speed of 73.6GHz. They comes with 8GB of RAM but they support 256GB of RAM(512GB when 8GB dimm's become available). The servers, themselves, are going to be relatively cheap compared to the high end SAN's and large amount of memory needed to get the throughput high enough tax the processor sub system. Without a large spindle count\cache and amount of RAM, the system will have an IO bottleneck long before the processors in most cases. Of course, some applications have special needs. :)
So when will these come into play? I think the biggest use of this box will be for consolidation particularly on SQL Server 2008. Imagine taking 20 or 40 instances on different OS’s, hardware, storage etc and making it one(or even 2 or 3) SQL Server 2008 instance. The environment would be so much easier to manage. The SQL Server 2008 resource governor was made for consolidation. Some of the new features in SQL Server 2008 are going to be CPU hungry like the spatial data, partition parallelism improvements and transparent data encryption. The data and backup compression features push both ways by lowering IO and increasing CPU with the idea of decreasing execution time. Even if you go to SQL 2005, it would be a nice upgrade for a consolidation box.
Of course, you have to worry about having all of your eggs in one basket but that is another post.
Introducing the dl785 g5: