Changes in hard disk drive technology in recent years have sped up server drive access. Serial ATA (SATA) drives have become the new standard for workstations, and Serial Attached SCSI (SAS) is the best choice for server-class storage. Server storage designs include locally attached storage, disk subsystem, Network Attached Storage (NAS), or Storage Area Network (SAN). A SAN is the best choice for high availability situations.
People often wonder about server storage options. What are the differences between different storage types? What solution is best for a particular situation? You have to sort through a lot of acronyms with server storage: U320 SCSI, SATA, SAS, NAS, iSCSI, SAN . . . the list goes on and on. In this article, I'll demystify server storage options and help you determine which solution is best for different situations.
Hard Disk DriveTechnology
There have been quite a few changes in hard disk drive technology in recent years, focused primarily on speeding up the slowest part of any server: drive access. Here's a summary of the different drive technologies and where they might be used.
Parallel ATA drives. PATA drives, also known as ATA, ATAPI, or Integrated Drive Electronics (IDE), are the type of drives that were generally used in workstations until recently. PATA drives typically have a maximum transfer rate of 150MBps. Most workstations still include an IDE controller on the motherboard that's used to interface with PATA CD-ROM and DVD drives. You can typically have two devices per controller channel, although some IDE RAID controllers let you connect more drives to a controller channel. If your workstation is more than three years old, it probably contains a PATA drive.
Serial ATA drives. SATA drives have become the new standard for workstations and low-end servers. The biggest improvement over PATA drives is the transfer rate. Older SATA drives have a maximum transfer rate of 1.5GBps, but newer SATA drives have a maximum transfer rate of 3GBps. Instead of a shared bus, which PATA drives use, each SATA drive has a dedicated channel to the controller. This design improves performance because drives don't compete on the same communication channel.
If you plan to use SATA drives in a server, you should verify the duty cycle that was used to calculate the mean time between failures (MTBF). The MTBF for SATA drives might be rated at a duty cycle more appropriate to workstations—maybe around 20 to 30 percent—than true server-class drives. MTBF for server drives are often calculated at a duty cycle around 80 to 90 percent. If you use a workstation-rated SATA drive in a server, you’ll probably experience a high degree of drive failure. However, SATA drive density is pretty good; 500GB drives are readily available. SATA drives are a good fit for nearline or archive applications where large amounts of data must be readily accessible, but highest performance isn't necessary.
Ultra320 SCSI drives. Ultra320 SCSI drives (or, sometimes, U320 SCSI) were the standard for servers and other high-end storage until a few years ago. As the name implies, Ultra320 SCSI has a maximum transfer rate of 320MBps. You can typically connect 14 devices to each SCSI bus. Ultra320 SCSI uses a shared bus, so the chance of SCSI bus contention increases with each additional drive you add to the SCSI channel. The largest readily available Ultra320 SCSI drives are 300GB.
Serial Attached SCSI. SAS drives are replacing Ultra320 SCSI in the server-class storage market. SAS drives have a transfer rate of 3GBps, although most drive manufacturers have plans to release 6GBps SAS drives in the future. SAS drives are designed to go into heavily used servers so their MTBF is calculated with a high duty rating. Just like SATA, there's a dedicated communication channel for each drive, eliminating any shared-bus contention. Although SAS drive performance is significantly better than Ultra320 SCSI, the drive density isn't very good. The largest drive you can get in the 2.5" form factor is 146GB; the 3.5" form factor can get you up to 300GB. This limitation usually isn’t an issue, though, if you need to build a high-performance disk array because using more drives will improve the performance of the disk array. But if you have a lot of data to store, 146GB drives might not be adequate.
Server Storage Designs
After you've chosen the appropriate drive type, you still have to decide where to install them: locally attached storage, disk subsystem, Network Attached Storage (NAS), or Storage Area Network (SAN)? Your application requirements should help you determine the storage option that's the best fit for your company.
Locally attached storage. Locally attached storage is installed directly in the server or is connected to an external storage device with a SCSI cable. A common configuration for a server is to use a RAID 1 array for the OS partition and a RAID 5 array for data. The best performing and most fault tolerant disk array is RAID 1+0 (or RAID 10), which combines data striping and mirroring. For the best performance, use a hardware RAID controller that has a hardware cache. The controller should have a battery backup to protect any data left in the hardware cache in the event of a server crash. Microsoft SQL Server log files perform best on a RAID 1 array, but the data portion of the database (or any randomly accessed file) will perform better on a RAID 5 or RAID 10 array. Locally attached storage is a good solution for servers that don't have high availability requirements. You can set up locally attached storage for as little as $1,000.
NAS. NAS devices are appliances that are capable of holding multiple hard disk drives (usually eight or more). They have one or more built-in Ethernet network cards. NAS devices serve files but don’t have any other server capabilities, such as email, database, DNS or DHCP. Although they can be placed on a dedicated network, NAS devices are usually placed on the public Ethernet network so workstations and servers can access the NAS device. A drawback of NAS devices is their tendency to become obsolete. For example, early NAS devices typically didn’t support Active Directory (AD), and they didn’t have an upgrade path. If you had a change in your environment that required your NAS device to support AD, you had to replace the entire unit with a new NAS device with AD support.
NAS devices are a good fit for applications where the data must be online but isn't accessed frequently. For example, a NAS device filled with SATA drives might be an appropriate choice for an email archive. Prices for NAS solutions typically start at around $3,000.
Just a Bunch of Disks. JBOD, just like it sounds, is a disk subsystem that holds many hard disk drives. The drives are often configured in a RAID 0 array with multiple drives striped together to create one large logical disk partition. JBODs typically can have SCSI, SATA, SAS, or Fibre Channel interfaces. JBODs are commonly used to backup data stored on a SAN. Data is copied from the SAN to the JBOD, then the data is copied from the JBOD to an offline media such as tape. By copying the data to the JBOD, the backup is performed faster and you don’t have to worry as much about data contention resulting from open files on the SAN; open files often can't be reliably backed up. JBODs are also used to consolidate data from multiple sources before it's backed up to tape. In enterprises with a lot of data to back up—more than a few terabytes—and a small backup window, JBODs are usually part of the backup strategy. The cost of implementing a JBOD solutions starts at around $5,000.
SAN. SANs are at the high end of server storage options. They come in two types, iSCSI and Fibre Channel, which I'll discuss in more detail below. A SAN's main advantage is shared storage: Unlike with locally attached storage, more than one server can access data on a SAN. On lower end SAN configurations, you have a single point of failure in the SAN chassis. You can configure the SAN to eliminate this single point of failure; however, the SAN price goes up significantly as a result.
SANs are typically used for high-availability solutions, such as Microsoft Cluster Server or VMware's ESX Server with VMware High Availability. Because a SAN allows for shared storage between two or more server nodes, a passive node can take the place of an active node in the event of a hardware failure in the active node. You would typically start considering a SAN if you have more than 400 users; if the cost of downtime is extremely high, you should consider a SAN at lower user numbers. For example, I worked with an organization that estimated its downtime costs at $20,000 per minute; even though they had only 20 users in the office, they opted to use a SAN.
The applications you run in conjunction with the SAN significantly impact how your LUNs should be created. LUNs are how each server node views the logical disk partitions. Each LUN is typically made up of a RAID array, commonly RAID 1, RAID 5, or RAID 10. For optimum performance, sequentially written data such as log files should be placed on LUNs made up of RAID 1 arrays and randomly accessed data such as database files should be stored on LUNs made up of either RAID 5 or RAID 10 arrays.
iSCSI SANs. iSCSI SAN is the less expensive SAN solution; they typically start at around $15,000. iSCSI SANs use Gigabit Ethernet to transfer the data between the server nodes and the SAN, which means the server nodes don't have to be in the same physical location; iSCSI is therefore a little more flexible to set up than Fibre Channel SANs. If you choose this solution, I strongly suggest using a TCP Offload Engine (TOE) card to process the iSCSI requests because these requests can place a significant load on the server’s processor. For the best performance, run the iSCSI SAN on a dedicated network that's separate from your LAN traffic.
An iSCSI SAN is a good solution when you need high availability but don’t have extremely high disk throughput requirements. The amount of money you save by using an iSCSI SAN can be used to purchase higher end server nodes, which might give you the best performance per dollar. An ideal application for an iSCSI SAN is a SQL Server database that has high-availability requirements but has relatively light database transactions, has to run a large number of stored procedures, and has powerful server nodes connected to the SAN. Because the database transaction load is light, you probably don’t need a really fast disk subsystem, but the large number of stored procedures places a significant load on the processors. If you have enough memory installed on each server node, a lot of data can be cached to further reduce disk I/O, especially if your servers run on the x64 platform. Figure 1 shows a typical iSCSI SAN configuration.
Fibre Channel SANs. Fibre Channel is the higher end SAN Solution. Typical solutions start at around $25,000. Early Fibre Channel SANs ran at 2GBps, but the newer solutions run at 4GBps and 8GBps. A 4GBps Fibre Channel SAN gives you the best disk performance available today. Instead of using a Gigabit Ethernet switch like iSCSI SANs, they use a Fibre Channel switch to connect the nodes and the SAN. Some vendors charge for each connection on the Fibre Channel switch, so you might have to pay a connection fee to add additional nodes. In a typical configuration, each server node has redundant connections to the SAN. Figure 2 shows a typical Fibre Channel SAN configuration.
When you purchase a Fibre Channel SAN, a dedicated engineer typically comes out to assist with the implementation. These specialist engineers verify that everything is properly installed and configured. If the onsite installation isn't included in the cost of the SAN, I suggest purchasing this service, especially if this is your first SAN installation.
Fibre Channel SANs are good solutions for Microsoft Exchange Server 2003 installations with large databases (e.g., more than 500GB). All other things being equal, the speed on the disk subsystem on an Exchange 2003 server determines the ultimate performance of the mail system. Note that the disk I/O requirements are significantly less on Exchange 2007 compared to Exchange 2003 because Exchange 2007 takes advantage of 64-bit processors and can cache a significant amount of data in memory.
It's Your Choice
Your requirements for applications, disk performance, fault tolerance, and high availability should help you narrow down your storage choices very quickly. For instance, if you need high availability on your server, you'll probably need to use a SAN. If you don’t have strict high-availability requirements, you can probably get by with locally attached storage. You still have many different storage options to consider for your servers, but you should no longer be afraid of that morass of acronyms. Use the information presented here to match a solution with your needs.