Bare-Metal Tuning

If you spend any time on the SQL Server Magazine discussion forums or Microsoft SQL Server newsgroups, you'll see a lot of questions about hardware setup and configuration. In fact, many of the performance problems that IT professionals encounter result from simple ignorance of good hardware design. I've spent many years as a DBA and developer, but it wasn't until I was specifically tasked with configuring enterprise-class servers that I learned to resolve many of the key difficulties in building well-tuned SQL Servers from the bare metal on up.

I frequently see IT professionals choose "throw more hardware at it" as their first tuning option; unfortunately, this is often an ineffective way to spend your company's money. I'd like to share several techniques you can use to build a server for peak performance and give you some specific recommendations for the CPU, memory, and disk I/O. Then, I'll give you several recommendations about how to prioritize your hardware options for the best application performance.

CPUs

Testing is crucial to understanding how your server will handle workload. I teamed up with Jim Drover, a performance and scalability expert, to conduct a series of SQL Server OLTP benchmarking tests. After much testing and retesting, we determined that every 10 percent increase in CPU speed generally yields a 6.5 percent boost in SQL Server's performance. Thus, a 3.3GHz CPU will perform about 6.5 percent better than a 2.9GHz CPU. This finding was consistent across different types of CPUs.

Many servers today can support multiple CPUs by using symmetric multi-processor (SMP) architecture. However, SMP servers don't scale linearly. In other words, adding one CPU to a server doesn't yield a 100 percent increase in performance. Instead, each extra CPU adds processing power on a declining curve, starting at about a 40 percent boost for the first CPU and declining to about an average 16 percent increase for every added CPU after the third on an OLTP application.

Motherboard technology has also seen advances over the past several years. However, our testing shows that stepping up to the most advanced front-side bus bandwidth will only improve performance between 3 and 5 percent over the commodity motherboard.

CPU cache. There are three types of memory cache available for CPUs. L1 cache is fixed at 16KB for data and 16KB for instructions, and is on-die, meaning it resides on the CPU chip. L2 cache is off-die, meaning not on the CPU, but is integrated into the processor cartridge and allows memory fetches at the full operating speed of the processor core. L3 cache is also off-die; it's located immediately adjacent to the CPU chip but isn't integrated with the CPU chip. Although this tiny difference in location might seem insignificant, the distance the electrons have to travel for a round-trip memory fetch is an order of magnitude greater in L3 cache than in L2 cache. Thus, L3 cache tends to be much slower, but it can contain a much larger amount of memory because it isn't part of the CPU chip.

Our testing showed that L2 cache improved performance 11 percent going from 512KB to 1MB and an additional 18.5 percent going from 1MB to 2MB. So, a CPU with a 2MB L2 cache offers 31.5 percent better performance than a CPU with 512KB of L2 cache. Most CPUs currently offer an optional 4MB L3 cache. In our testing, an extra 4MB of L3 cache improved SQL Server performance by 39 percent.

One of the most common questions users ask is, "Should I buy a cheaper CPU with a smaller L2 cache or a more expensive CPU with a larger L2 cache?" Complicating this decision is the fact that faster chips usually have smaller L2 caches than slower chips with a larger L2 cache. Here are some basic rules for selecting cache:

When running one or two CPUs, go with the fastest CPU available even if you have to sacrifice L2 cache size. After that, if you have a choice of L2 cache size, always get the largest you can.
When running four or more CPUs, go with the CPUs that have the largest L2 cache, even though their speed might not be as great as that of CPUs with the smaller L2 cache. In SMP configurations of four or more CPUs, SQL Server runs better with larger L2 caches.

Hyperthreaded CPUs. An intriguing feature in Intel chips is hyperthreading, which lets one chip process more than one instruction per cycle, in effect creating a second virtual CPU. With hyperthreading enabled, Windows and SQL Server see one physical CPU as two logical CPUs. Intel conservatively estimates that hyperthreading improves performance by 10 to 30 percent, but our testing showed much better performance. With hyperthreading enabled, our tests showed an average 35 percent increase in SQL Server performance at a 1000-user OLTP load and a 46 percent increase at a 500-user OLTP load. This performance boost is great, considering you'd need a non-hyperthreading CPU that's more than 1GHz faster to achieve the same performance increase.

It's important to remember that hyperthreaded CPUs share L2 and L3 cache. Consequently, applications that would benefit from a big CPU cache would likely get less of a performance benefit (although they would still get some) from hyperthreaded CPUs.

Memory

Windows and SQL Server are well known for consuming as much memory as you can throw at them, but is more memory always better? In most circumstances, yes. But you need to consider some caveats to this rule:

Specifying minimum and maximum RAM rather than letting SQL Server dynamically allocate RAM can boost performance on systems dedicated to SQL Server, because paging is reduced.
Parallel queries consume more memory than non-parallel queries. It's a good idea to disable the sp_configure setting Maximum Degrees of Parallelism (Max DOP) on your SQL Server unless you've specifically tested and verified a performance improvement by enabling Max DOP.
Address Windowing Extensions (AWE) usually doesn't work well with other applications and will attempt to consume all the resources it can access, so enable AWE only on dedicated SQL Server boxes.

SQL Server 2000 Enterprise Edition allows up to 64GB of RAM when you're using AWE, but SQL Server can use the memory over 4GB only for data cache. Therefore, you can still hit memory bottlenecks on procedure cache. (Contrast that with 64-bit computing, in which the entire memory space is available for any SQL Server use, such as procedure cache.) Don't forget that other versions of SQL Server support different amounts of memory—for example, the maximum amount of memory SQL Server Standard Edition supports is 2GB.

DISK I/O

In order to properly configure and tune disk I/O, you must understand the fundamental types of redundant array of inexpensive disks (RAID), their benefits, and their drawbacks. The main benefit of RAID is availability; when one disk in a RAID array fails, other disks in the array enable the server to keep running without faltering. The second benefit of using RAID is improved performance. Since RAID arrays comprise multiple disks, you have many disk heads performing reads and writes simultaneously, thus speeding up disk I/O.

The most common types of RAID used in a SQL Server database are RAID5, RAID1, and RAID10. In addition, many enterprises are now deploying storage area networks (SANs), so we evaluated each of those as well.

RAID5. Under RAID5, each write records a data block on a data disk and a parity block on another disk in the array. For reads, RAID5 reads the data block directly and checks the parity block on reads. RAID5 requires at least three drives to implement, as Figure 1 shows. Benefits of RAID5 include one of the highest read–data transaction rates (although only a medium write-data transaction rate). RAID5's low ratio of parity disks to data disks means high efficiency, and the large number of disks gives a good aggregate transfer rate.

RAID5 has some drawbacks as well. Disk failures degrade performance moderately, but the volume can keep running. RAID5 also has the most complex controller design and is harder to rebuild in the event of a disk failure than RAID1. In addition, the individual block-data transfer rate is the same as a single disk.

RAID1. In almost every case, a RAID1 solution is superior to RAID5. To achieve the highest performance with RAID1, you need a good controller card that can perform two concurrent separate reads or writes per mirrored pair. As Figure 2 shows, RAID1 requires at least two drives to implement.

RAID1 has some great benefits. For example, it can sustain one write or two simultaneous reads per mirrored pair. That's twice the read transaction rate and the same write transaction rate as a single disk; furthermore, the transfer rate per block is equal to that of a single disk. Also, RAID1 offers 100 percent data redundancy, so you don't need to rebuild in case of a disk failure, just copy to the replacement disk. Furthermore, RAID1 has the simplest RAID storage subsystem design and can sustain multiple simultaneous drive failures (i.e., one drive failure per mirrored pair).

Unfortunately, RAID1 has a 100 percent disk overhead, since every read or write must be duplicated. And the cost of RAID1 is high because you need twice as many disks.

RAID10. As Figure 3 shows, RAID10, also known as RAID1+0, offers all of the benefits of mirroring (RAID1) and striping (RAID0) with few of the drawbacks. It's the highest performing and most reliable RAID, but it's also the most expensive. RAID 10 requires a minimum of four drives to implement. On the positive side, you implement RAID10 as a striped array (RAID0) whose segments are mirrored arrays (RAID1). RAID10 has the same fault tolerance and the same overhead for fault tolerance as RAID1, and RAID10 has higher I/O rates than RAID1. However, RAID10 is expensive because you must start with at least four disks before you can build out an RAID10 array. Sustained performance may decline because disk heads need to remain on parallel tracks, and RAID10's high cost limits its scalability.

SAN. SANs are rapidly becoming the disk array of choice for many enterprises. In essence, a SAN is a single-purpose server loaded with dozens or even hundreds of disks. SANs have the speed of multi-disk arrays similar to RAID but offer a great deal of added flexibility and scalability. The cost of SANs has come way down in the last year, with basic models in the $20,000 range, so simple SANs may now be within reach for many small businesses.

Testing proves out that high-end SANs outperform RAID arrays. SANs have a couple of other advantages. For example, SQL Servers can have more than two fiber connections to a SAN, so you can increase bandwidth to the SAN when necessary. Another big value point for SANs is efficient disk usage. In a well-tuned RAID configuration, you might need a terabyte of disk to support a 100GB database so you'll have enough heads reading and writing data. SANs give systems only what they need and don't over-allocate disk space the way that RAID does. In addition, you can use a SAN to support many servers simultaneously, as Figure 4 shows, and dual-path and multi-path SANs match the performance of RAID1 and RAID10 volumes.

Drawbacks of using a SAN include the expense and difficulty of setup. In addition, a properly sized and configured SAN requires you to measure bandwidth and I/O requirements of all systems that simultaneously use the SAN.

Isolating SQL Server Database Objects

You can eliminate or alleviate disk I/O bottlenecks by separating database objects and moving them to separate RAID volumes. Our testing, although not of a sample size large enough to be scientifically valid, did reveal some trends. We saw a greater than 10 percent increase in disk I/O performance by isolating user database files and transaction log files. We saw a disk I/O performance boost of 5 to 9 percent when isolating tempdb. And we saw less than a 5 percent improvement in disk I/O performance when isolating SQL Server executables, Windows executables, and Pagefile.sys.

Note that these improvements in disk I/O performance are cumulative. So, you could gain a sizable performance improvement by partitioning a variety of database objects across multiple RAID volumes.

Disk I/O Lessons Learned

Conventional wisdom says that RAID5 is inferior to RAID1 and RAID10. Our testing verified that RAID1 and RAID10 had very similar performance, with both performing 319 percent as well as RAID5. SAN performance, in terms of the total number of transactions processed per second, was nearly identical to that of RAID1 and RAID10. However, it's worth noting that SAN transaction times (the time it took for a single transaction to complete) were almost twice as fast and that disk I/O queue lengths were shorter compared with RAID1 or RAID10, perhaps due to better disk I/O caching. In a sense, our testing showed that SANs help alleviate latency on disk I/O compared to RAID.

Another important lesson is that more disk drives (aka spindles) equal better performance. In our tests, RAID5 performance improved by 1.4 percent for each added spindle beyond the baseline configuration. RAID1 and RAID10 performance improved about 5 percent for each added spindle beyond the baseline configuration. Here are some disk I/O rules we learned from testing:

Add spindles to boost read performance.
Add controllers to boost write performance.
Disk write cache gives a big performance boost, but it carries risks. Some disk write caches require either an internal battery or an uninterrupted power supply (UPS) on the server to prevent data loss in the event of a catastrophic failure. In SQL Server terminology, this is called "guaranteed writes."
Don't forget that disk and SAN cache never outperform memory cache. RAM is also cheaper than disk cache, so invest in RAM first.
Splitting the data file onto two or more RAID arrays can boost disk I/O writes in extreme I/O bottlenecks.
As expected, faster disks improved transactions per second and dropped wait times.

Finally, don't forget about the "working disk" conundrum. In the working disk situation, you'll hear many people say that RAID10 is faster than RAID5. This is certainly true, but the assumption behind this statement is that performance is better "for the same number of working disks." However, you usually don't get four times the money to buy the same number of working disk drives when choosing RAID10 over RAID5.

Think it through, and remember that RAID5 may be your real best option unless the purse strings are quite flexible. For a handy checklist of how to prioritize your hardware shopping, see the sidebar, "Recommendations."

How We Tested

The tests we performed were based on the transactions and database schema used for the TPC-C benchmark (http://www.tpc.org) and scaled from 500 up to 1000 concurrent users. (Note that TPC-C is an OLTP test, so the configurations and recommendations are best for OLTP environments and might not be best for OLAP environments.) We built a low-end commodity server to act as our baseline server configuration. We then changed various elements of the bare metal server such as CPU, memory, and disk I/O of the baseline server to determine how much of an improvement, by percentage, you could expect compared to the baseline.

Comments

Plain text