I work as a contract DBA for a large research and manufacturing company. My manager recently asked me to find out which database configuration was most effective with our new large RAID 5 Storage Area Network (SAN). The vendor for this SAN didn't have any documented SQL Server 2000 database file-placement recommendations or best practices. I needed to test whether placing the data and log files on separate RAID 5 SAN partitions produced any performance benefits over placing both files on a single RAID 5 SAN partition. Usually, placing database and log files on separate arrays (RAID 5 or RAID 10 for data and RAID 1 for the log) boosts performance. Because SQL Server writes to the transaction log file sequentially and the data files randomly, separating the files lessens the chance of simultaneous read or write contention.
The SAN I was testing took advantage of built-in caching—RAM that resides on the disk controller itself—so I didn't want to assume that the standard rules applied. Some SANs use caching algorithms in the cache controller to help facilitate the movement of data blocks (read-aheads) and writes to the controller cache (memory). For example, the SAN I tested has an algorithm that reads data into the cache based on past I/O request patterns, in an attempt to predict future read requests. Another algorithm gathers sequential writes in cache and attempts to write them to disk in large blocks or full stripes. The desired result of these caching algorithms is to reduce I/O overhead.
To test the database server under various load conditions, I used the Database Hammer tool that's included in the SQL Server 2000 Resource Kit. Database Hammer is a Visual Basic (VB) 6.0 code sample that creates a table, populates the table with 10 million rows of data, then lets you simulate hundreds of concurrent SELECT and UPDATE operations. Database Hammer is great for stress-testing a database; it lets you spawn multiple connections that each issue random read/write operations (the ratio of which you can configure). The tool also records client/server performance statistics. These statistics show current, minimum, maximum, and average response time (in milliseconds) of the stored procedure call (either a Read or a Write) for each slave process. Although this information is good for measuring client/server response time, I was most interested in seeing the stress test's impact on I/O. I decided to set up my own Performance Monitor counter logs to capture this I/O information as well as memory, CPU, and buffer cache statistics.
Let's look briefly at the counters I used in my tests, review how to set up the counter logs, and see how to run the Database Hammer tests. Then, I'll show you the surprising results I got.
Setting Up the Performance Monitor Counters
During each test load, I wanted to measure the performance counters that Table 1 shows. My primary goal was to see which database performed better under excessive I/O activity; for this, I used the Physical Disk counter Average Disk Queue Length and the SQL Server Latches counter Average Latch Wait Time. I also wanted to see what impact the SAN data file partitioning would have on the CPU, RAM, and the SQL Server buffer cache hit ratio. For CPU, I measured the Processor counter % Processor Time. For RAM, I used the Memory counter Pages/Sec. For SQL Server buffer cache hit ratio, I used the SQL Server Buffer Manager counter Buffer Cache Hit Ratio.
To set up a Performance Monitor counter log to use during your Database Hammer tests, first make sure your Performance Logs and Alerts service is running under a domain account that has administrator permissions to access the server you're testing. You can verify this by clicking Start, Settings, Control Panel, Administrative Tools, then double-clicking Services. Double-click Performance Logs and Alerts in the right pane, then go to the Log On tab, which Figure 1 shows, and see which account the service is using. If you're running Performance Monitor from the client and not the server, verify that the Performance Logs and Alerts service account is configured with a domain account that has administrator access to the server. I recommend running your performance-counter logs and Database Hammer tests on a separate client from the test server so that you can test under true client/server conditions. Close the Performance Logs and Alerts dialog box when you're finished, then right-click Performance Logs and Alerts and select Start to start this service.
Next, start Performance Monitor: From your desktop, click Start, Run, and type "perfmon," then click OK. You can also start Performance Monitor from your desktop by selecting Start, Administrative Tools, Performance. Expand Performance Logs and Alerts in the left pane and select Counter Logs. Right-click Counter Logs; then, in the New Log Settings dialog box, type in a name for your counter log, and click OK.
To add a new performance counter to your counter log, in the General tab of the Counter Log dialog box, click Add. Type the test server name (e.g., \\servername) in the Select counters from computer: field and select the objects and associated counters that you want to monitor, as Figure 2 shows. Note that some counters have multiple instances. For example, for the Processor object's % Processor Time counter, if your server has four CPUs, you can choose one, some, or all of the CPUs. In my tests, I monitored all processors. For the Physical Disk object's Avg. Disk Queue Length counter, I selected only one SAN partition for the first database test, placing both files on one drive, and selected both SAN partitions for the two-partition database test.
After selecting all the counters you want to monitor, click Close. On the same General tab, change the interval to 1 second instead of the default 15 seconds to make your counter logging more granular. After you click OK, the counter will turn from red to green, meaning that it started up automatically. You need to right-click the counter and select Stop. You'll start this counter log at the beginning of each test.
Performance Monitor automatically creates a new file under your C:\PerfLogs drive each time you start a counter log. The log name will be the same name as your counter log, with a six-digit number attached (e.g., MyLog_000002.blg). The system automatically increments the number in the file names each time you start and stop your counter log. Make sure to track the file name for each test you run so that you don't compare the wrong results.
To view your counter log results, start Performance Monitor. In the left pane, click System Monitor. In the right pane, click the Properties icon (a hand pointing at a piece of paper). In the System Monitor Properties dialog box, select the Source tab. In the Data source section, select Log file instead of Current activity. Click Browse, select the counter log from the C:\PerfLogs folder, then click Open. Next, select the Data tab and click Add. In the Add Counters screen, select each counter you want to view results for (and its associated instances) and click Add. When you're finished, click Close.
Next, go to the General tab and make sure either Default or Average is selected in the Report and histogram data section. Both Default and Average show the average results for each counter. Click OK. In the right pane, select the View Report icon (piece of paper). The resulting window shows your average data results for the test, as Figure 3 shows. I exported my results into a Microsoft Excel spreadsheet for easier comparison. (Right-click the results window and select Save As. In the Save As dialog box, select the file destination path and file name. Select Save As Type "Report (*.tsv)" to make the file readable by Excel. Double-click the .tsv file and select Microsoft Excel for Windows in the Open With dialog box.)
Setting Up Database Hammer
All the necessary Database Hammer files are in the \ToolsAndSamples\DatabaseHammer folder on the SQL Server 2000 Resource Kit CD-ROM. You need the VB runtime msvbvm60.dll on your client before you can run the executables. The SQL Server Client Utilities install the necessary VB runtime files. You also need Windows 2000 Server and SQL Server 2000 to run Database Hammer. According to the documentation, VB 6.0 is required for modification of the Database Hammer code.
To set up the test, first copy all Database Hammer files from their folder on the CD to a folder on your local server's hard drive. Next, create the test database. In my case, I had to create two databases to test: one database that had the data and log file on the same SAN partition, and one where the data and log files were on separate SAN partitions. To prevent excessive file auto-growths that can slow down the initial database load, set the size of the data file and the transaction log file to 500MB.
After creating the database, create a SQL Server user login (not a Windows login) and give this user dbo permissions for your test database by making the user the dbo or adding the user to the db_owner database role.
Open Query Analyzer and, from the toolbar, select the database you just created. Next, run the createdb.txt script, which you can find in the DatabaseHammer folder. Unlike the file name suggests, createdb.txt creates the TestTransaction table, not a database. Next, run the storedProcs.txt file in your selected database to create two stored procedures, which the VB sample code will use to perform SELECT and UPDATE queries against the TestTransaction table.
Double-click LoadSlave.exe to start the VB application in the background. (No UI is associated with this executable.) The loading of data isn't part of the official Database Hammer test; however, you might want to track and compare the total load time between tests. Keep in mind that running the data load from a client will take more time than running it from the server.
To complete the loading process, double-click LoadMaster.exe in the DatabaseHammer folder. Figure 4 shows the UI that appears. In this box, you enter the names of the server and the database that you'll be testing. You must also enter the SQL Server user you created previously. LoadMaster.exe uses LoadSlave.exe as an out-of-process COM object and is therefore dependent on LoadSlave.exe being started first. Before you continue, be advised that by clicking Start, you begin inserting 10 million rows of test data into the TestTransaction table. This process takes a while (for my test, it took almost 3 hours) and produces close to 400MB of data, so make sure you have sufficient disk space and database space allocated or auto-growth set so that your data load doesn't stop prematurely. (You can set auto-growth in the Database Properties dialog box through Enterprise Manager or by using the ALTER DATABASE command.)
One "gotcha" here is that LoadMaster.exe doesn't shut down or notify you after loading the 10 million rows. The application stops at 10 million rows, but you'll get no visual notification of when to shut down the UI. To get around this shortcoming, periodically run
SELECT COUNT(*) FROM dbo.TestTransaction WITH (NOLOCK)
in Query Analyzer to see whether it's time to end the application. Another quick way you can check is to query the sysindexes table for the number of rows in the TestTransaction table.
I found it enlightening to have my performance-counter log running during the data load. I was curious to see which database would load faster. Figure 5 shows the results of my record load in both test scenarios (data and log on separate partitions versus data and log on the same partition). As you can see, the data load ran 28 minutes faster on the database that had the log and data file on the same partition. The only other significant difference was the higher SQL Server Average Latch Wait time (975.657ms on the database with the data and log on separate partitions versus 461.306ms on the database with data and log on the same partition).
Starting the Load Test
To begin the load test, first double-click the ProcSlave.exe application in the Database Hammer folder you created. You can run both the ProcSlave and ProcMaster applications on the server or on the client machine to simulate one client with multiple connections. You can also run several ProcSlave and ProcMaster applications on different client computers simultaneously if you want to test a multiple-client, multiple-connection client/server setup.
Second, double-click the ProcMaster.exe application. The UI is the same as the LoadMaster interface, but in the lower left corner you'll see a Stats button, which is grayed out until your test begins. Also, the Instances field is no longer grayed out. The application will spawn as many instances of the ProcSlave application as you select. By default, half the statements call the UPDATE stored procedure and the rest call the SELECT stored procedure. During the test, you can view the statistics for each application instance by clicking the Stats button. Spawning all the connections takes a few minutes (my test took 6 minutes to spawn 500 connections), so keep an eye on the number of connections to make sure you're running counters at the desired thresholds. Once the number of connections reached the number of instances I'd selected in the ProcMaster interface, I started the counter log. To start your counter log, right-click the counter log you created and select Start. To stop, right-click the counter log you created and select Stop.
In my test, I wanted to measure 5 minutes of activity for each scenario. Before stopping the ProcMaster executable, I stopped the counter log at the 5-minute mark so that the disconnection of the slave processes wouldn't affect the averages. To stop the test, click the Pause button in the Database Hammer window (ProcMaster.exe) and close the window.
Because Microsoft provides the VB files for the Database Hammer application, you can customize the code as you need to. For example, you might want to change the SELECT-to-UPDATE ratio or perhaps institute a more complicated command against the table to reflect likely transactions in the production environment. You can change the ratio within the GetCommand procedure in the ProcSlaveClass.cls class module. However, I found that the sample test load fit my needs well.
My Test Scenarios
Before looking at the results of my Database Hammer test, let's review the test scenarios I used. I started by creating the test database with the log and data files on the same SAN partition. Then, I loaded the 10 million records, being careful to record the start and finish times, and kept the performance-counter log running during the load. I then restarted the SQL Server service to ensure that SQL Server caching wouldn't affect the test. Next, I ran the Database Hammer test from my client for three scenarios: 100, 500, and 800 connections. I'd originally decided to test 1000 connections, but my client machine started giving me "thread creation" errors at around 850 connections. (The application couldn't spawn more than 850 connections from my client.) I made sure that the performance-counter log measured activity only when all connections were spawned, shutting it off before stopping the test and deallocating the connections. I also restarted SQL Server before the start of each test. I then created a new test database with the log and data files on separate partitions. Using this database, I repeated steps 2 through 4 (i.e., loading 10 million records, restarting the SQL Server service, and running the Database Hammer test for 100, 500, and 800 connections).
Coupled with information from Performance Monitor counter logs, the Database Hammer was an excellent tool for performing stress tests and revealed performance data that I didn't expect. Figure 6 shows the test results; the figures highlighted in red are the counter-log results that displayed the biggest differences. The single-partition SAN database displayed a lower Average Disk Queue Length for all three tests. Average Disk Queue Length measures the number of read and write requests that are queued for an array. For this RAID 5 array, the number includes average queued read and write activity across all disks in the array. The rule is to divide the Average Disk Queue Length by the number of physical disks making up the array; if that average result exceeds 2, you might have an I/O bottleneck. This counter's results were surprising in that one array would presumably be more of an I/O bottleneck than two arrays.
Beginning with the 500- and 800-connection tests, SQL Server Average Latch Wait time was also significantly lower on the single-partition SAN database than on the multi-partition SAN database. Higher Average Latch Wait time values can suggest either memory or I/O problems. The normal value for Average Latch Wait time depends on your hardware, database design, and activity. To determine whether either test (single-partition or multi-partition) suggests an I/O bottleneck would require further investigation, but the results do show that the single-partition configuration had significantly less latching and fewer queued I/O operations for the same Database Hammer tests.
To make sure these test results weren't a fluke, I ran them each three times, at different times of the day. All tests displayed similar results. These tests showed that for my setup and system, placing the data and log file on separate SAN partitions results in more I/O than placing both files on the same partition. I believe these results could change as we add more applications and databases to the SAN. Unlike with local disk arrays, when considering SAN performance, you must consider the impact of the SAN cache controller, SAN bandwidth limitations, traffic on the SAN caused by other applications, and algorithms used to optimize the cache and RAID array performance. As our environment changes and more databases are added to the SAN, I'll use Database Hammer and Performance Monitor again to verify whether conditions and I/O behaviors have changed.