Beyond a certain point, increasing SQL Server's performance is very difficult--and expensive. You get bottlenecks that you can't get rid of no matter what you do. SQL Server bottlenecks exist in many places both inside and outside the physical box. Inside, SQL Server can choke on disk I/O, CPU resources, memory, and network performance (or other interrupt-driven peripherals). Outside, SQL Server can suffer from client performance (workstation horsepower), overall network capacity, software efficiency, and the ever-present user element, dragusperformancedownium. Filtering particles of this annoying element out of your system is every administrator's nightmare. But you can compensate for bottlenecks by tuning your server to its maximum potential. (For tuning tips, see "More Easy SQL Server Performance Tips," page 88.)
In the Windows NT Magazine Lab, I tweaked and tuned my way to some performance numbers that show the differences--or lack thereof--between the Pentium and Pentium Pro microprocessors and Windows NT Server 3.51 and NT Server 4.0 running SQL Server 6.5 on single, dual, and quad-processor systems. For information about the systems I tested, see "Compaq ProLiant 5000," and "NEC ProServa SH." For information about my testbed, see "The Testbed."
Graphs 1A through 8B show two tests on the different systems. Graphs 1A through 4B show performance for single-read activity, and graphs 5A through 8B show mixed read/write activity. The results were surprising:
- You get a performance boost with faster CPUs, but only under heavy load.
- More cache is better. However, 256KB of Level 2 cache on a Pentium Pro is not enough; 512KB is better.
- SQL Server 6.5 scales evenly to four-CPU configurations. The transactions per second (TPS) graphs, Graphs 1A through 8A, show that with more CPUs, you can generally handle more transactions.
- NT Server 4.0 is not faster than NT 3.51 under this type of application load.
- Pentium Pro scales better than Pentium, and response times are faster. But the reported enhancements in NT Server 4.0 for Pentium Pro optimization don't show themselves under SQL Server. On both Pentium and Pentium Pro, NT 3.51 and NT 4.0 perform virtually the same, as TPS graphs 1A through 8A show.
- Five or six drives for a data volume is enough. Beyond that number, you won't improve performance. The biggest difference comes from separating all the types of I/O activity onto different physical drives.
More cache is better. In a test described in "Diamond Flower Doubleshot" (February 1996), Andy Smith found that a dual-Pentium server with a shared 512KB cache (the DFI Doubleshot) performed much worse than a dual-processor server with independent cache modules (the ProLiant 4500). The latter is a more expensive solution, but it significantly boosts performance. Now you have to choose how much cache per CPU you need.
My tests for this month's issue showed that a Pentium Pro (the NEC ProServa SH) with the on-chip 512KB Level 2 cache did not produce a significantly faster response time on a higher transaction count than the Pentium Pro with the 256KB on-chip cache under heavy load at the same clock speed. For example, Graphs 8A and 8B show no marked improvement over Graphs 7A and 7B under heavy load. However, the design of the NEC may have been the limiting factor: For example, the ProLiant 5000 with the 512KB Level 2 cache showed a significant difference in scaleability from the NEC ProServa SH, as you can see by comparing Graph 6A to Graph 8A. The Compaq ProLiant 4500--a Pentium system with 2MB cache modules for each CPU--showed lower TPS and higher average response time (ART) than the Pentium Pros, but it still performed well in these tests. As Screen 1 shows, the 4500's overall CPU utilization for single read and mixed read/write activity was higher than the ProLiant 5000. However, although the ProLiant 4500 did not have as high a TPS value or as low a response time, it handled the load without breaking down.
You really see a difference at four CPUs: The ProLiant Pentium Pro with the 512KB Level 2 cache wins by a landslide, with the best TPS and response times. The 256KB Pentium Pro has a real problem in two- and four-CPU configurations under heavy load, as graphs 3 and 7 show. CPU utilization is 100 percent at 300 users on all systems we tested, but the 256KB Pentium Pro system falls down on the job. The right side of Screen 2 shows that the 256KB Pentium Pro simply can't handle the load. Toward the end of each test run under heavy load, the system basically stopped processing requests.
Intel is aware of this problem and recommends that you use 256KB Pentium Pros only in single- and dual-processor servers and workstations. Many vendors, such as NEC and Compaq, have responded by ceasing sales of symmetric multiprocessing (SMP) enterprise servers with more than two CPUs with 256KB cache. The price differential to get 512KB is more than $2000 per CPU, but the performance problems you face otherwise make increased cache essential.
For your data volumes, five or six drives are the effective limit for improving performance. Compaq has found that more than six drives in a RAID 0 stripe set do not improve performance. The limitations on SCSI (six devices on one standard SCSI channel) and performance hits for duplexing multiple controllers mean that adding drives to a data volume won't help.
My tests confirmed this result: For my 500MB data set, I saw no system enhancement going from two, to four, to five drives for the data volume on the Compaq ProLiant 4500. Screen 3 shows %Disk Time maxed out during the mixed read/write tests. You can see that disk usage on the Compaq ProLiant 5000 was high, although I/Os per second (not shown) were within an acceptable level (less than 20 per second, well within the tolerance of 80 I/Os per second for a system with a hardware RAID controller).
If you need to increase capacity, look at controllers with multiple channels on which you can create large arrays, use multiple controllers, or use larger drives such as 4.3GB Fast and Wide drives or the new 9GB ones. To improve performance, you can use faster drives and faster controllers, but remember that disk I/O is only one component of overall system performance.
Microsoft made significant claims about the performance enhancements in Windows NT 4.0 over NT 3.51, including better SMP scaleability, optimization for Pentium Pro, and overall speed improvements. Pre-release and beta information claimed that NT 4.0 would be as much as two times faster than NT 3.51.
I found that with SQL Server 6.5, NT 4.0 offers only slightly better performance than NT 3.51, on either Pentium or Pentium Pro, as TPS graphs 1A through 8A show. Is this lack of improvement a limitation of NT or the application running on top of it? The answer is both.
Windows NT 4.0 has a smaller footprint and better network throughput than NT 3.51. But additional features, such as the new GUI and extra services (new network protocols, etc.) negate these advantages.
The OS can go only as fast as the application it runs, and even Microsoft has found that SQL Server doesn't perform better on NT 4.0 than on NT 3.51. If an application isn't programmed to use the new features and enhancements in NT 4.0, the OS isn't going to run better. Perhaps SQL Server 7.0 or another database program will respond differently--we'll just have to wait and see.
What about NT 4.0's optimization for Pentium Pro? This feature means that NT 4.0 runs faster than NT 3.51 on a Pentium Pro than on the Pentium. But I didn't get that result. The improvement I saw was only about 5 percent to 10 percent, and even that improvement wasn't consistent. On the Pentium Pro with the 256KB Level 2 cache, NT 4.0 responded with lower TPS and higher ART at high loads than did NT 3.51, as graphs 3A and 7A and 3B and 7B show. In screens 2 and 4, you can see that the NEC ProServa SH with 256KB of Level 2 cache performed much better under NT 3.51 than under NT 4.0, as evidenced by the smooth scaling of CPU usage. Upgrading to 512KB of cache smoothed performance under both NT 3.51 and NT 4.0. You get more features with NT 4.0, but not better performance.
Pentium Pro offers about a 60 percent capacity improvement over the Pentium. Under mixed read/write activity, the ProLiant Pentium Pro ran at 33 TPS, and the ProLiant Pentium stopped at about 21 TPS. The differences were about the same for single-read activity.
Again, CPU utilization shows the differences: The Pentium worked harder with less performance than the Pentium Pro, as shown in screens 1 and 5. As you can see in Screen 5, the ProLiant 5000 scaled evenly from low to high user loads, and overall performance didn't degrade at the high end, although response times rose. Machine slowdown under heavy load is expected, but the computer shouldn't choke. The Pentium Pro system operated at much lower response times, even under very heavy load. The cache in the Pentium Pro makes a difference to TPS values, but not so much to response time.
Many systems will perform the same under light load, because the system is not fully stressed. So when you look for differences between architectures or systems from different vendors, you need to fully stress each subsystem. For example, if you want to see how much better one CPU will perform than another, you have to run it at nearly 100 percent CPU usage, or you won't see any difference in performance. You can see this difference in my TPS graphs. For 150 or fewer users, all systems perform similarly, so a faster CPU doesn't buy you performance improvement. Under light load, upgrading CPUs from Pentium to Pentium Pro will give you slightly lower response times, but you won't get a huge jump in performance. Under heavy load, you can add capacity and significantly improve performance by upgrading the CPUs.
The good news is that SQL Server 6.5 on both Windows NT 3.51 and NT 4.0 scales, and unlike Exchange Server 4.0 (described in "Optimizing Exchange to Scale on NT," November 1996), SQL Server scales consistently from one to four CPUs. Throwing more CPUs at a SQL system is a good investment for handling more transactions and reducing response times.
Contrary to what you'd expect, scaleability is the same for both NT 3.51 and NT 4.0. And, let me repeat that the 256KB Pentium Pro has problems at four-CPU configurations.
Under a Microscope
What does all this information mean to you, your upgrading and purchasing decisions, and your users? Well, how much money do you have?
Upgrading database servers is a complicated issue. With some thoughtful analysis, you can cut through the hype and mumbo jumbo to find what's right for you. Here are some things to keep in mind.
My tests are by no means the be-all and end-all of the SQL world, but they point to some interesting issues. If you are not already bottlenecking on CPU or memory resources, disk I/O, or network throughput, how do you improve database server performance? You can redesign and optimize your SQL applications (better code, more efficient procedures) as I describe in "More Easy SQL Server Tuning Tips," you can tune SQL somewhat, or you can throw more memory at the application. But you'll reach a point after which you can't improve performance. Only buying more stuff for your server will make a difference. So, what do you buy?
In midrange environments (about 300 users) with moderate data sets (5GB to 20GB), extra CPUs are a good investment, as is memory. Get plenty of memory so that SQL has enough to operate in, maybe even enough to run with TempDB fully in RAM, and enough so that NT does not do any paging. Use Perfmon to watch SQL and system counters, and tune appropriately.
Upgrading to Windows NT 4.0 will buy you features but not performance. However, upgrading won't hurt you, either. You need to weigh the cost of new licensing fees against the easier and more efficient administration of NT 4.0.
Do you need to replace your existing Pentium servers with Pentium Pro systems? This decision depends on how much money you've got. If your Perfmon analyses don't show that you are already bottlenecking on CPU resources, adding faster CPUs will give you only an incremental performance improvement. Weigh this option against the cost of replacing an entire server. If you have a heavy load, Pentium Pros will probably improve response times and give you more capacity, depending, of course, on your transaction mix. Compare graph pairs 1 and 5 to 2 and 6 as an example of how a move to Pentium Pros can improve performance. A mostly disk-intensive environment still won't get a big boost from more CPU resource. Be aware of the cache issues described earlier: 200MHz Pentium Pros with 256KB cache modules drag in SMP systems.
The decision to add drives is a little easier because you can do so without replacing your whole system. A new, faster disk or RAID controller will help, especially if you use NT's software-based RAID capabilities or have an old server. Many new systems ship with Fast and Wide SCSI controllers. Improving disk I/O is generally a great way to improve system performance inexpensively, but only to a point (six drives for one data volume). Try breaking up SQL and system operations across multiple drives and controllers or SCSI channels.
Beyond upgrading CPUs, you can change the brand of your server. Compare graph pairs 2 and 6 to 4 and 8. My tests show you'll find performance differences between platforms from certain vendors using the same types of hardware (CPUs, disk controllers, etc.). The scope of the differences depends on your load and the specific design and architectures of the machines. CPU-to-memory bus width, bus design, and many other factors influence system performance. You'll also find differences in upgradeability, feature sets, service and support, and price. The choice depends on your needs.
Capacity vs. Raw Performance or Speed
When analyzing your system's performance or the new system you want to buy, be clear about what you're looking for. Capacity addresses how many users you can support or how many transactions per second your server can offer. Performance or speed, in my mind, represents how fast your server processes requests and returns results (i.e., average response time): What is your processing power, and what overhead is available?
You, of course, want both capacity and speed. But you generally can't have your cake and eat it, too. You must consider the tradeoffs between how much work your system can do and how fast your system can do that work.
Graphs 1 through 8 show that, with certain system types and configurations (and, depending on load and transaction types), you can have low response times with steadily increasing TPS values. You can throw more work at the system and not take a performance hit.
You reach a point where the TPS values level off and the response times start rising markedly, that point is where your capacity runs out. If you're lucky, your system won't start losing transactions at that point; it will just get slower. Now is when you need to add CPUs, memory, or other resources according to where you find the first performance bottlenecks.
You can squeeze more performance out of your database server with some intelligent tuning, system configuration, and upgrades. First, look at your setup to determine whether you have the optimal configuration. Do you have enough network bandwidth (do you need to add a segment to the server?) and enough memory allocated to Windows NT and SQL Server (is it balanced?), and are the disks laid out properly? Next, have you done all you can through software to fully optimize your system: Have you tuned SQL parameters and NT and written efficient SQL code?
If you've done all these things, look at upgrading the hardware: Add disks, add memory, add CPUs, or replace the server with a faster architecture (new generation Intel chips or RISC). Always look for bottlenecks in your whole system, on the client side and server side, before buying new stuff. Be sure the components you buy will actually improve system performance; if you aren't already maxing out a given resource, upgrading it won't help much.
Corrections to this Article:
- The "Disks" subsection of "Microsoft SQL Server 6.5 Scaleability," by Joel Sloss said that "more than six drives in a RAID O stripe set do not improve performance." The correct limit is seven.