Any large enterprise that plans to implement a business intelligence or data warehouse solution with 1TB or more of data might find itself looking at a significant investment for a server system that can provide the capacity and throughput that such a database environment requires. Large SAN and server solutions can certainly support large data warehouses, video streaming, or data-mining implementations. But such systems come at a high price. A system that can scan through the database at roughly 700MB per second of sequential I/O might cost anywhere from $500,000 to more than $1 million—and that's for just the hardware, not including the cost of software and maintenance.This enormous price tag led me and my colleague, Son Voba, to look for an alternative server solution.
Voba and I were impressed by the tremendous advances in software and hardware technology over the last 12 to 18 months, and we were inspired by the 2004 SATA disk throughput research of Microsoft Research Scientist Jim Gray and others in Microsoft Research. We wanted to build a prototype server that would consistently deliver more than 2GB per second sequential database reads and writes. Gray's research and testing used the Newisys 4300 server, SATA drives, and SATA disk controllers connected to 48 SATA disks with 48 SATA cables directly attached to the disks. Gray's configuration achieved a remarkable 2.5GB per second for sequential writes and slightly less than that for sequential reads.
We decided to build a white-box server (a non-branded system built from generic components) consisting of a beefy microcomputer with a 64-bit OS with massive main memory, database software, and direct-attached storage. This article describes that system—a prototype server running Windows Server 2003 x64, SQL Server 2005 x64, 32 GB of RAM, and high-capacity, direct-attached 7200 RPM SATA disks. This system, which our team has tested over the past 6 months, can compete with mega-servers often costing ten to twenty times more. Because of its low cost, such a system could become a contender for a big share of the database server solutions market.
In July of 2005, Son Voba, a handful of hardware vendors, and I put together our prototype server in a lab on the Microsoft Redmond Campus. The goal was to test SQL Server 2005 x64 with high-speed, high-capacity direct-attached disk storage by using SATA disk drives and serially attached SAS SCSI disk controllers, SAS expander boards, and cabling.We chose SATA drives because they have a high capacity—up to 500GB per drive today, with larger drives on the horizon. SATA drives are also several times less expensive than SCSI drives based on capacity and are comparable to SCSI drives in sequential I/O bandwidth.This bandwidth is essential for data-warehousing workloads that scan gigabytes of database storage when processing queries or moving large data sets around. I won't attempt to compare the differences in flexibility, high availability, and reliability of SAN storage solutions and direct-attached disk solutions. For excellent explanations of SATA disk drives and SAS protocols, see the materials at the SCSI Trade Association (http://www.scsita.org) and the Serial ATA International Organization (http://www.sata-io.org).
To test our prototype,we used real customer databases, more than 2TB total, from a data-warehouse solution running on SQL Server 2000. In addition, we used a disk-stress utility called SQLIO (SQLIO.exe—available for download at http://www.microsoft.com/downloads) to measure disk throughput.
We built the prototype server during July and August of 2005 using a four-processor dual-core AMD server. Four hardware companies generously loaned us the hardware. Table 1 shows the cost of the components based on market price in December 2005. Newisys (a Sanmina-MCI Company) loaned us the 4300 Server, which came with 32GB of memory and four dual-core AMD Opteron 2.2GHz CPUs. Newsys also provided 64 400GB Seagate SATA disks. LSI Logic Corporation loaned us six SAS3442X Host Bus Adapters (HBAs—disk controllers). Vitesse Semiconductor Company loaned us four SAS expander cards. And Chenbro Micom Company loaned us four 16-bay drive chassis. Figure 1 shows the configuration we created with these components.
We decided to use LSI SAS ( serial-attached SCSI) HBAs and expanders because they are emerging technology that's evolving from the traditional SCSI standard and because SAS lets you mix and match SATA and SAS disk drives connected to the same controller. We plugged the six SAS HBAs into the Newisys server by using four of the PCI-X 133Mhz slots, one of the PCI-X 100Mhz slots, and one 66MHz PCI-X slot.This configuration distributed the six HBAs across the PCI-X buses and the three PCI-X bridges on the motherboard, which Figure 1 shows. This distribution optimized data bandwidth and kept the processors quickly fed with data. We found that SQL Server 2005 could keep up with more data, so the faster the HBAs and disks, the faster DRAM could process queries or data writes.
By using the SAS disk controllers from LSI and SAS expander boards from Vitesse, we were able to use one Molex cable from each LSI HBA in each of the six PCI-X slots to connect to one of the four SAS expander boards,reducing cable clutter to six cables from the 48 that Gray used.The SAS expander let us connect 16 disk drives to one expander. Because we had six HBAs from LSI and only four Vitesse expander boards, we were able to plug the additional Molex cables from disk controllers 5 and 6 into the same Vitesse SAS expanders as disk controllers 3 and 4.
The SAS expander boards let you daisy-chain more expanders, increasing the number of disk drives you can directly attach to a disk controller. However, our initial testing showed that one disk controller can keep up with at most 12 drives when doing sequential scans. Six cards can fully exploit the bandwidth of 72 disk drives for performance but also let you double or triple the number of disks by daisy-chaining the expander boards connected to additional drives.
By zoning the Vitesse expanders, we were able to segregate the disk traffic on disk controllers 3,4,5,and 6 to eight physical disk drives each. The Vitesse SAS expander board basically lets you plug 16 physical SATA drives into the expander board by using Infiniband cables and plug one cable from the expander board into each LSI HBA in the six PCIX slots. In our configuration, each LSI HBA was capable of delivering 400MB per second or more of data throughput—double the current throughput of a typical fiber-channel disk controller used to connect to a SAN.
Once all our hardware and cables were connected, we were able to use SQLIO.exe to demonstrate roughly 2.2GB per second for sequential reads, just over 2GB per second for sequential writes, and more than 24TB of useable disk space. It's important to point out that a direct-attached disk subsystem delivering more than 2GB per second of data throughput is incredibly fast—in many cases, faster than SAN storage systems costing over forty times more than our solution. For some SAN systems, the cost per useable terabyte is close to $20,000. And as you'll recall, the total price for our prototype, including the disks and entire disk subsystem hardware, was roughly $46,100.
Even though our prototype exceeded our expectations, our throughput result was slightly less than the 2.5GB per second sequential I/O throughput that Jim Gray achieved, perhaps because we used SAS protocols. We plan future testing to push for higher throughput. But by using SAS, we were able to daisy-chain more SAS expander boards to the existing four boards we used in the prototype and reduce cable clutter from 48 cables to 6 from the HBAs to the expander boards and drive chassis.The net result is that we could have added a lot more disk drives than the 64 we used—perhaps double, up to 128 total—giving us more than 48TB of useable disk space.
Working with Real Data
Happy with our raw-disk throughput and capacity, we were ready to restore our 2TB customer database.We used 64 drives for the two central databases that comprised the 2TB customer data warehouse, which came from one SQL Server 2000 instance. We used Windows 2003 to mirror pairs of drives.This made our useable database disk space just over 12TB. Given the cost and the throughput of this server, and having the fault tolerance of mirrored drives, we were confident we had a high-performing, reliable server with a lot of disk capacity. We were still very content that we had 12TB of useable storage for our prototype server.
We restored the two customer databases. The source and target databases each contained 16 SQL Server data files, and each data file was on a separate mirrored volume. We overlapped eight files from each database on the same set of eight mirrored disks so that we had eight free mirrored disks for TempDB and the SQL Server log file.
Once we restored our databases, we set out to run several stored procedures, queries, and index builds on our prototype server. By doing so, we were able to compare exact times for our prototype to the SAN-based customer server, a 16-CPU server with more than 64GB of main memory connected to a high-end SAN that has six fiber-channel controllers that deliver about 170MB per second of sequential data throughput. Table 2 shows a comparison of some notable results from both servers.
Although we don't have room in this article to include more results, you can see from some of the test results in Table 2 that our prototype server holds its own against a much larger and more expensive server. In addition, for operations such as SELECT INTO and index builds, we saw significantly better performance compared to the customer server. Only the range query was slightly faster on the customer server. We plan many more tests on concurrent query workloads and OLTP-type workloads, instead of the sequential I/O-based workloads we've tested so far. It's important to realize that up to this point in our testing, we've used only SATA drives, which offer data throughput comparable to SCSI or SAS drives for workloads that are sequential in nature. However, SATA drives are significantly slower than SCSI or SAS drives for random I/O operations. A typical SATA drive at 7200 RPM delivers about 100 I/Os per second, whereas SCSI or SAS drives can deliver more than 150 I/Os per second. Because our server used SAS HBAs and expanders, our next round of tests will use SAS disk drives that we simply plug into our Chenbro drive chassis after removing the SATA drives. We also anticipate some exciting results for OLTP-type workloads.
The combination of hardware,Windows Server 2003 x64, and SQL Server 2005 x64 in our prototype delivers a formidable combination of database-processing power, disk capacity, and pure throughput. I hope the results I've presented here will spur customers to evaluate the opportunities for using SQL Server 2005 x64 for development and QA servers in addition to production server applications.
Author's Note: Special thanks to Jim Gray of Microsoft Research for his vision and research,which inspired our prototype and this article.