Do you have four processors, 4GB of memory, and RAID 5 in a single internal array? This week is for you. Many SQL Server professionals reading this commentary have I/O problems that they don't know about. For space reasons, I don't typically write in depth about bits and bytes here, and it will be difficult to discuss I/O problems in detail, but I'll do my best to enlighten you given the space that I have.
First, let me grossly simplify the definition of a database-system hardware bottleneck: It's the hardware layer (typically I/O, CPU, memory, or network) that has the greatest limiting effect on server throughput. According to this definition, all systems have a bottleneck because at any point in time, some component of your hardware system is the slowest. It might not be terribly slow in relation to your workload, but there's always a weakest link in the chain. In a perfect world the hardware components would be nicely balanced so that you don't have some fast components matched to comparatively slow components; such an unbalanced setup wastes money because you don't get the full benefit of the fast components. Unfortunately, hardware with mismatched performance characteristics is all too common, and I/O subsystems typically draw the short straw.
I frequently see customers hinder SQL Server performance by using a few—maybe 4 or 5—physical disks on a run-of-the-mill internal RAID array on a box that has four (or maybe 8) fast CPUs and lots of memory. In this scenario, the customer will most likely create an I/O bottleneck because the performance characteristics of the disks can't keep up with the throughput capabilities of the faster CPU and memory. These customers might have been better off spending more money on I/O and less on other parts of the system to ensure a more balanced set of hardware.
Many customers compound the problem by using a single RAID 5 array and placing their data and logs on that single array. It's a well-recognized best practice to separate data and logs to ensure maximum throughput. I'm not saying that RAID 5 is a bad solution—sometimes it's the most cost-effective way to meet your I/O needs when you have a fixed amount of money. However, contrary to popular belief, RAID 5 isn't the most efficient choice for either reads or writes. RAID 10 will invariably be faster and RAID 1 is often better. But RAID 10 costs a lot more than RAID 5, so many customers waste hardware money by matching excessive CPU and memory resources to an underpowered I/O subsystem and that's a shame. As I said, you often misuse money by under-spending on I/O and overspending on other areas of the system. Seek Zen like balance in the distribution of your hardware resources.
If you answered yes to my opening question about having four processors and 4GB of memory running on a single RAID 5 array, then you've probably introduced an I/O bottleneck into your system. Don't panic yet. In many cases, the server—even with the mismatched I/O-is still powerful enough in relation to the workload that no user-perceived performance problems exist. But I/O problems are the most common hardware problems that I solve.
I started this commentary by saying that it would be difficult to cover I/O problems in this space, and that's true. In some ways I've probably raised more questions than I've answered. I'm not going to attempt to provide a comprehensive list of resources to peruse or methodologies to consider. Spend five minutes Googling and you'll uncover weeks' worth of reading material to help you better understand the complex issues behind building an optimized I/O subsystem and measuring if you have an I/O bottleneck. I encourage you to seek information about DBCC SQLPERF(waitstats) and the fn_virtualfilestats() function, which are my favorite ways to find out whether a customer has an I/O problem. Become one with your hardware resources, seek out the bottlenecks, and start your homework.