Disk input/output (I/O) performance is the single biggest constraint on database performance in the SharePoint world. Running SharePoint on an underpowered or poorly designed SQL Server environment will slow performance throughout the farm.
Because it's equally important to make sure SQL Server databases are kept highly available, some level of disk redundancy is usually factored into physical designs for database servers. Disk redundancy is often classified by RAID levels. After I explain how the various RAID levels can help performance and availability, I'll share some tips on how to design your disk system for maximum performance at the database level.
RAID technologies predate PCs and modern x86 server designs. Three of the most common levels are RAID 1, RAID 5, and RAID 10, so let's take a closer look at them.
RAID 1. In a RAID 1 design, you have mirrored pairs of disks. If one disk fails, the other has a full copy of all the data and goes on operating. RAID 1 is a high-performance technology but requires purchasing 2GB of raw disk space for every 1GB of data because you need two sets of identical data. So, although RAID 1 might initially seem like an inexpensive solution because there are only two disks involved, it can be an expensive one if you have a lot of data to store.
RAID 5. In this RAID level, you use groups of disks to create a large synthetic unit. For example, five 150GB disks could be combined to create one hybrid 600GB storage pool.
At this point, you might thinking, "Wait a minute, that's not correct!" Although I'm not great at math, I know that 150GB ´ 5 should equal 750GB, not 600GB. However, in RAID 5, one disk is dedicated to calculating a checksum, or parity bit, every time data is written to the group. That way, if any of the single disks fail, the data can be redistributed to other disks by reversing the parity calculations. As a result, RAID 5 is highly efficient at optimizing the volume of data stored. It's also great at handling high-volume read operations -- more disks working together means more systems to read bits from a disk platter and fewer chances for bottlenecks.
Write operations are a different story. Because every write operation requires a calculation, high-volume write operations are slower than they would be in a traditional disk system. However, for SharePoint, that's not always the worst thing, because SharePoint is a read-heavy database consumer.
RAID 10. Some people make a distinction between RAID 1+0 (stripes across mirrors) and RAID 0+1 (mirrors of stripes). This debate doesn't really matter in SharePoint designs. What does matter is that out of all the RAID levels, RAID 10 offers the highest availability and highest performance possible. However, RAID 10 requires purchasing extra disks for parity and mirroring -- more than 2GB of raw disk space for every 1GB of data, effectively more than doubling storage costs.
SQL Server databases are the largest consumers of disk space in SharePoint. Thus, designing your disk system for maximum performance at the database level is more crucial than for any other level. Here are some general design tips.
Don't virtualize SQL Server. If possible, don't virtualize SQL Server because it's already an integration platform. If you have to virtualize SQL Server, try to limit the virtualization to test or development systems. Avoid using it for production systems because it raises the bar in disk engineering to get good performance. SQL Server databases stored in virtualized disk files are inherently slow compared with dedicated physical disks.
Use multiple logical drive letters. It's usually a good idea to break up SQL Server databases into multiple logical drive letters because database files, transaction logs, backup files, and temporary databases (tempdb) benefit from having multiple independent sets of disk spindles. Adding more spindles spreads the load across multiple parallel operations when data is being written to the database.
If you're going to use a SAN or virtualized environment, make sure you understand where those logical drive letters will be mapped. For example, if the D, E, and F drives will all point back to separate LUNs on your SAN, but those LUNs will be part of the same storage group and same set of physical disks, splitting those files into multiple drive letters will add complexity without significant performance gains.
Use RAID 10 judiciously. RAID 10 is great, but you might not be able to justify it for all applications. For example, it might be overkill for backup files. Balancing disk performance and cost is a reasonable trade-off. One possible design is to use:
- RAID 1 on boot disks
- RAID 5 on data disks
- RAID 10 on log disks
- No RAID or RAID 5 on backup disks
Break large content databases into multiple database files. If you have large content databases, you can engineer better performance by breaking each large database into multiple database files. Each database file should be on a separate disk.
Presize SQL Server databases. SQL Server databases can be set to automatically grow as needed, but this can lead to massive file fragmentation. Presizing the databases to a sufficient size at the outset helps ensure contiguous file allocations. Note that SQL Server's tempdb database is heavily used by SharePoint, so you should presize it to about 20 percent of the size of the single largest content database.
If you want to use automatic database growth settings instead of presizing your databases, you should set the databases to grow in 50MB to 100MB clumps and not by percentage. Setting a 100GB database to grow in 10 percent increments means the database essentially stops to add 10GB or more on each increment. Using a small clump size will lead to more frequent, but smoother, steady state (i.e., continuous) growth.
Design a High-Performance Database Environment
If you use RAID and follow the general design tips, you can design a high-performance database environment that's also highly available. This is essential to a smoothly running SharePoint system.