A task that’s frequently skipped when setting up a new SQL Server or application is proper capacity planning. With storage costs falling every quarter, it’s easy to tell your storage administrator that you need 500GB of storage. But what have you really asked for? Will 500GB of storage last six months, three years, or 10 years? Without the answer to this question, you might not be asking for the right thing. Also, you need to consider the speed of your storage. Some storage is more expensive than other storage. Although your organization might not charge back when storage is requested, as a member of the IT staff it’s your responsibility to ensure the most effective use of the department’s resources. Let’s look at how to determine how much storage you need, how to configure your disks, and the types of storage available.
The easiest part of capacity planning is determining how much storage space your database requires. You need to figure out the total number of bytes that you will need to use over n months. Although calculating this number might appear to be a rather daunting task, when you break it down to each table it’s not as scary. First, add up the size of all the static lookup tables. The size of these tables won’t change from month-to-month, so once you have that value you can just add it onto the end. For the tables that will be growing in size every day/month/year, you need to know a few key pieces of information. You need to know the average number of rows to be added per cycle. A cycle could be a day, a month, or a year—however often the data will be coming into the system. If it’s a real-time system, such as an order entry system, ticketing system, or network security system, daily is probably the best assumption. You also need to know the average length of each row. The developers can probably help you answer this question. You’ll also want to find out what your organization’s record retention policy is. Do you plan to keep data for a month, a year, or forever? You also need static numbers; in this case, the total number of bytes per data page on the disk is 8060.
From here some basic arithmetic will provide you with the total number of bytes. For this example, we’ll assume a daily cycle of 20,000 rows, with an average row length of 187 bytes, and a corporate record retention period of three years. The first calculation that you need to do is to figure out how many rows fit into each data page. This calculation is important because SQL Server can’t split a row between two data pages. Next, take the number of bytes per page and divide that by the number of bytes per row (8060/187), which gives you 43.101 rows per data page, as shown in Table 1.
You’ll want to round this number down because SQL doesn’t page split, which gives you 43 rows per data page. Then take the number of rows per (daily) cycle and divide that by the number of rows per data page (20000/43), which gives you 465.11 as number of data pages you need for the table, as shown in Table 2.
Because you can’t have a part of a data page, round this number up to 466 data pages. Although each data page can hold only 8060 bytes, the data page itself is actually 8KB in size. Now take 466 data pages and multiply that by 8KB to get the total amount of data added to the table each day, which is 3728 KB or 3.64MB, as shown in Table 3.
If you stretch this number out to a month (30.5 days on average), you get 111.02MB per month. Now multiply that number by the retention period (36 months) to get 3996.72MB or 3.9GB over three years, as Table 4 shows.
After you’ve determined how much space each table requires, add the numbers up to get the total space needed for the database over your retention period. If your retention period includes moving data to an archive database for reporting purposes, your actual data sizes will be smaller because some data is being removed from the tables. After figuring out how much space your data requires, you then need to think about the speed and redundancy of the disks that will be hosting your data.
Configuring Your RAID
On paper, everyone says to use RAID 10 for your database. It has the fastest write response and there’s no parity calculation overhead. However, when people typically talk about storage they forget one important thing: cost. Although RAID 10 will get you the best performance, it’s an extremely expensive RAID level to use. RAID 10 takes twice as many disks to configure, and those disks aren’t free. RAID 10 should be used in some cases, but deploying it to all your databases by default is typically an extremely inefficient use of your storage resources. The majority of databases that are configured for RAID 10 storage would probably work just fine on RAID 5 storage.
It’s important to understand the difference between RAID 5 and RAID 10 before deciding which RAID level to use. RAID 5 is also called a stripe set with parity. As the data is written to the disk, a parity calculation is done on each data page. This parity calculation is written to the disk with the data page and used by the array in the event of a disk failure to let the array rebuild the data when the failed disk is replaced. Some SAN vendors will place all the parity information on a single disk, and some will spread the parity information across all disks in the RAID array. Although both techniques will give you the same level of protection, having the parity spread across all the disks in the array will increase the speed of the array because the additional disk can be used for reads and writes.
RAID 10 is also called a mirrored stripe. The number of disks in the array is split in half, with each half going into a single strip. These two strips are then mirrored against each other. In the event of a disk failure, the strip that contained that disk is unusable until the disk is replaced and the data is mirrored from the undamaged strip to the now repaired strip. Because there’s no parity to be calculated for each write, the writes can happen faster; however, because only half the disks in the RAID 10 array are used for reading and writing data, twice as many disks are needed to get the same amount of capacity. Figure 1 shows the two different ways that RAID 5 arrays can be laid out, as well as how a RAID 10 array is laid out on the physical disks.
Tuning the Cache
One thing to remember when working in a SAN environment is that the SAN has cache, a lot of cache. That cache is set up to allow massive amount of writes to be cached, meaning that when SQL Server flushes to disk it’s not actually writing to the disks. Instead, the SAN is caching that write into its cache to be flushed to the actual disks at a later time. Because of this caching, the RAID level means nothing to SQL Server because SQL Server isn’t actually writing to the disks but rather writing to the cache.
By default, most SAN systems come with the cache set up for 50 percent read cache and 50 percent write cache. Although this setup works great for things like data warehouse systems, it’s not so good for OLTP databases. SANs use read ahead to try and predict what information the host system is going to be looking for and read it into the read buffer before the host actually requests it. This way, the data has been read from disk and the SAN can simply give out the data from the read cache. However, in an OLTP application (e.g., databases, exchange, file servers) the parts of the disk that need to be read from are rarely in sequential order. They’re usually randomly placed throughout the drive. The read ahead might actually cause more load than the benefit it’s providing. You might be better served to reduce the ratio of read cache to write cache from 50/50 to 30/70 or 20/80. Reducing the ratio lets SQL Server write more data to the cache where it really needs the speed.
This cache change also improves the case for RAID 5 disks. Because SQL Server will actually be talking to the disks only when it’s reading from them, you want the most disks processing the read as possible. With RAID 10, only half the disks are used for reading data, and with RAID 5 all of the disks (or all but one depending on how the SAN vendor handles RAID 5) are used.
In addition to tuning your cache, there are some things that you can do at the LUN and disk level to increase performance. You might want to experiment with disabling the read cache on the LUN itself. Doing so will prevent any caching of information from the LUN into the SAN’s cache. The reason for this is that SQL Server caches a lot of information into its own cache within the server’s RAM. It rarely needs to go to the disk for data to begin with, and when it does, the data won’t be available in cache. Once the data is in the SAN’s cache, it will also be in the SQL Server’s cache. There’s only a slight performance hit when having read cache enabled on your database LUN. The SAN will attempt to read ahead when SQL Server is doing reads to the disk to try and have the data that SQL Server is looking for available in cache so that SQL doesn’t have to wait for a call to the physical disk to get its data. Because of the random nature of OLTP databases, the odds of the SAN being able to pull the correct data to cache is almost none. In OLTP systems where the disk is already strained, having the read cache enabled might slightly affect performance, in systems where the disk isn’t already strained, leaving the read cache enabled shouldn’t negatively affect performance.
At the disk level, an important yet often overlooked part of storage tuning is correctly aligning the disk. When the specifications for the master boot record (MBR) were written, storage was pricey. Therefore, the MBR was set up to be 63 blocks on the disk, so the first bit of data goes in the 64 block. This setup is problematic because disks like to do everything in 64-block chunks. So because the data is now offset by 1 block, each time SQL Server wants to read or write, twice as many operations need to be done. In applications such as Microsoft Exchange or Oracle ASM where all disk activity is done in 8KB (8 block) reads and writes, this effect isn’t as pronounced. But SQL Server does all its disk access in 64KB (64 block) reads and writes. The net effect is that every read and write operation requires twice as many physical operations to the disk.
This problem can be fixed fairly easily if the disk is created by using the DISKPART.EXE command-line tool in Windows Server 2008 and Windows Server 2003. After the LUN has been presented to the server, log onto the server and launch diskpart.exe from a command-line window. Then run the command
SELECT DISK 3
where 3 is the disk number you just added to the server. Then run the command
CREATE PARTITION PRIMARY ALIGN=64
This command will create a primary partition on the disk with the alignment set to the 64 block of the disk instead of the 63 block of the disk. Do this for each LUN that has been presented to the server.
If you have LUNs that aren’t correctly aligned, this change isn’t as easy to do because there’s no tool to correct this alignment after the partition has been created. The easiest method is to backup your database and delete the database from the server. Then remove the partition and recreate it with the correct alignment. Next, restore the database to the disk. Some SAN vendors have client-side migration tools that can be used perform this process online, and there’s third-party software available to perform this process as well. However, don’t use a back-end SAN migration tool because it will copy all the data, including the misaligned partition information.
Shared Everything vs. Shared Nothing
There are two basic techniques for setting up storage: shared everything and shared nothing. With the shared everything technique, all disks in the SAN are used for every LUN. Although this technique looks great on paper, it has some drawbacks. If any one disk in the array begins to slow down, then every LUN on the array is affected. The shared everything approach is good if you don’t have a full-time SAN administrator because there isn’t much to do beside create LUNs and assign them to storage.
With the shared nothing approach (more correctly called the shared some approach), LUNs are created in smaller RAID groups throughout the SAN. Small numbers of physical disks are created in RAID groups of various configurations and sizes and the LUNs are placed within these RAID groups. The good thing about this configuration is that if a specific LUN or disk starts to cause a slowdown, the only LUNs that are affected are the ones within that LUNs RAID group. The downside to this system is that it requires more hands-on configuration and more knowledge of the I/O levels that you need to support. The upside is that if a LUN needs more I/O, you can move it to another RAID group or even have it span more than one RAID group to give it more I/O throughput.
SAN vs. DAS
Today, the SAN is king in the world of storage. The SAN is very fast and usually made up of hundreds of fiber channel drives, all of which are connected to the server via high-end fiber cables. However, SAN solutions are very expensive, often costing hundreds of thousands of dollars just to get started. This brings us to DAS. DAS is a great solution if you’re company doesn’t need a large-scale SAN solution. DAS offers you pretty much all the same capabilities as a SAN, just at a smaller level, and usually with less cache. DAS devices are made up of anywhere from 2 to 45 SCSI, SAS, or SATA drives, with anywhere from a few megabytes of cache to a gigabyte of cache.
All the same configuration options are usually available with DAS as with SAN, including read cache and write cache balancing. Just like SAN drives, DAS drives still need to be aligned (as do all RAID arrays). Unlike a SAN, a DAS usually won’t let you control cache settings on a disk-by-disk level. Also, DAS systems are connected to a specific server, so you can’t have more than one server connected to a DAS, as you can with a SAN.
The biggest drawbacks to using DAS for your databases are the lack of growth potential, because there’s a physical limit on the number of disks you can put into a DAS, and the fact that you can’t pool your resources between multiple servers with DAS devices. Another problem you might run into with DAS devices is that you can’t spread a single RAID array beyond the number of disks in a single drive shelf. Most DAS drive shelves hold 15 drives, and each drive shelf is connected to a single RAID controller. Because the RAID controllers don’t talk to each other, you can’t RAID between drive shelves. However, these drawbacks are offset by the very attractive entry cost. DAS storage is inexpensive when compared to the costs of purchasing a SAN solution.
iSCSI and the Database
A newer storage technology that’s just breaking into the database world is iSCSI. iSCSI uses your existing Ethernet to connect the database to the storage. Many SAN vendors now include iSCSI as part of SAN systems. The upside to iSCSI is that you get SAN quality storage without the additional cost of the fiber channel equipment. The downside to iSCSI is that all your storage traffic now has to compete for bandwidth with your regular Ethernet traffic. If you have a slow or extremely busy Ethernet network, iSCSI probably isn’t the solution for you. Another potential problem with iSCSI is that because your storage traffic is going over the TCP network, you have to deal with the TCP timeout settings. If your storage array fails for some reason, fiber channel will time out very quickly and retry along another path. However, the timeout for TCP packets is much higher, which will be represented within your database as blocking or processes timing out.
To address network bandwidth issues, network engineers use a technique call VLANing to separate and isolate network traffic. VLANing ensures that traffic on your iSCSI network is isolated from other parts of your network. This isolation is a logical separation of network segments into Virtual LANs (VLANs) at layer 2 of the Open Systems Interconnection (OSI) Reference Model. A VLAN is a logical domain within the network switch that lets only traffic that has been tagged for that VLAN to access that network switch. Traffic within a VLAN, including broadcast traffic, is kept isolated within that VLAN. Any traffic destined for another VLAN will have to be routed. This helps prevent non-iSCSI network traffic from affecting your iSCSI traffic and helps prevent your iSCSI traffic from affecting other non-iSCSI traffic. VLANing works at layer 2 and separates traffic only at the layer. To further isolate traffic for applications that require layer 3 switching or routing, VLANs can be associated with a routing subnet that will let traffic cross logical domains (VLANs) to communicate with other VLANs. Furthermore, layer 3 routing can also be configured with an ACL to allow or prevent specific traffic within specific subnets from communicating with each other. Other techniques at layer 4 and layer 5 are also available, but that’s beyond the scope of this article.
Get the Best Performance Out of Your SQL Server
Properly setting up and configuring the physical disks is essential to getting the best possible performance out of your server. This is especially true with SQL Server (or any database server) because database servers read and write data to and from the disk in much higher quantities and at much higher speeds than other servers in the enterprise. Therefore, you want to make sure that every issue is addressed when configuring your storage, especially when you consider how much time it takes to make these changes after implementation.