MICROSOFT SQL SERVER lets you quickly build powerful and reliable database applications, but making those apps perform their best is tricky. Luckily, database administrators and developers can use several simple techniques to coax more speed out of a SQL Server database. Here are 10 such performance-boosting tactics you can quickly and easily apply to your SQL Server database applications.
Tip 1: Assign Data Type Numeric Columns
Database administrators and developers often assign the CHAR (character) data type to columns containing only numeric information. Character data columns can hold anything and are a good catch-all. However, specifying CHAR for numeric columns can decrease performance with filters and join conditions and can increase storage requirements. For example, suppose you're developing an application that tracks equipment stored at several sites. Tables 1 and 2 show the application's inventory_header and inventory_detail. As you design your application's tables, you notice that the part_number column always contains a number between 1 and 12,000. However, when you create the tables, you define all these columns with a char(20) type and place an index on these columns in each table.
A commonly used report in your application prints a listing of the inventory-item counts at all locations. The following statements make this report printout possible:
select b.location, a.part_number, a.part_desc, b.part_count from inventory_header a, inventory_detail b where a.part_number = b.part_number order by b.location, a.part_number
If you define the part_number columns in Tables 1 and 2 as char(20), the join operation requires the engine to compare the values in each table byte by byte, up to 20 times per row. Because you know the part_number column is always numeric and between 1 and 12,000, you can redefine the part_number column's data type in both tables as smallint. This data type can hold values between -32767 and +32767.
This data type helps your application perform better in several ways. First, because SQL Server stores smallint values in only 2 bytes of data, you save a significant amount of CPU work during join operations or filter conditions, especially if you are processing several rows. Instead of comparing up to 20 bytes to see whether two rows are logical join candidates, the engine can finish its work after comparing only two bytes. Thus, using smallint instead of char(20) saves you 18 bytes per row per table for all indexes that use this column.
Tip 2: Prevent Substring Searches
When search criteria in a SQL statement start at the leftmost position of a column and move right, searching through the indexed character data is inherently easier than when your search criteria begin in the middle of the data--that is, in a substring. For example, suppose you have to locate two types of information in a phone book: all last names starting with a capital L and all entries with a lower-case l as the third letter in the last name.
The first request is easy because the phone book is already indexed (last name, first name). You simply turn to the L section of the book and read through the names until you find the first entry with a last name starting with M.
This type of processing is analogous to running the SQL statement
select * from names where last_name >= 'L%'' and last_name < 'M%''
If an index is on the last_name column in the names table, the optimizer uses the index to help process this query.
For both machines and humans, the second request is harder to process than the first. With the phone book, you have to read the first three letters of every entry from cover to cover and mark all entries that have l as the third letter.
This type of processing is analogous to running a SQL statement that searches for a substring within the last_name column.
select * from names where substring(last_name, 3,1) = 'L''
Although an index is on last_name, the optimizer can't use it because you're not beginning your search from the leftmost byte of the last_name column. To satisfy your request, the optimizer has to operate sequentially on what could be an enormous table, which definitely hurts performance.
|TABLE 1: Inventory_header Table|
|78||Monochrome ASCII terminal|
|143||Monochrome 3270 terminal|
|TABLE 2: Inventory_detail Table|
|TABLE 3: Bonus Program Participation|
If you must sometimes sort on a noninitial substring, consider revising your database design. Try to divide the column in question into several smaller columns that you can index. This revision will help accelerate searching, joining, and sorting.
Tip 3: Don't Assign Too Much Memory
As part of installation and administration, the SQL Enterprise Manager lets you set the amount of memory dedicated to SQL Server. If you allocate too much memory to SQL Server, you increase the burden on NT by denying it valuable memory resources necessary to service other tasks, such as other applications and the operating system. This burden can result in swapping, which occurs when the OS runs out of memory and starts resorting to the disk to make up the shortfall. In fact, too high a memory value can make booting SQL Server difficult, if not impossible. Swapping can quickly negate any performance gains you obtain by increasing the cache hit ratio. (The cache hit ratio is a measure of the percentage of time that SQL Server finds information in memory, rather than on disk. Disk I/O is expensive, so reducing it is a good idea whenever possible. A high cache-hit ratio means that the engine can avoid this costly effort.)
The most logical approach to allotting memory to SQL Server is to raise the value gradually and monitor performance carefully between changes. If caching rates don't improve after an increase, you've reached the optimal memory value. Also keep track of the swap rate, because raising the memory value can inadvertently increase swapping.
Tip 4: Avoid Highly Duplicate Indexes
A highly duplicate index contains a limited number of unique values. For example, if you place an index on a column that contains only two or three values, that index is highly duplicate.
Having a highly duplicate index on a column is worse than having no index at all. The database engine must work a lot to process a highly duplicate index during important operations such as inserting, updating, or deleting data. For example, suppose you have a table that tracks customers' participation in a special bonus program, as you see in Table 3. If you place an index on the participated column, you have only two possible values for that index: Y or N. When you add a new row into this table, the engine must add the new value to the existing index.
This addition can cause the undesired side effect of internal index restructuring to handle the new value. Deleting, querying, or sorting based on the index also makes the engine work harder, thus adding many unnecessary I/O operations to your system and likely degrading performance. Instead of using the highly duplicate index, having the engine go directly to the data and read it sequentially is probably faster and simpler.
You can determine whether an index is highly duplicate by running the Database Consistency Checker (DBCC) show_statistics command or selecting the Index Distribution Statistics option on the Index Information screen in SQL Enterprise Manager. With other information, the statistics utility reports on the index's selectivity (as you see in Screen 1), the relative number of distinct values in the index. In this case, only two values are possible, which yields a poor degree of selectivity.
When confronted with such a poor degree of selectivity, SQL Server may not even use the highly duplicate index at all to retrieve information. However, the highly duplicate index still causes extra work for SQL Server during inserts, updates, and deletes.
To avoid performance problems when you must sort or search on a highly duplicate value, combine the column in question with a column containing more selective values. This step creates a composite index. In the previous example, the correct composite index is
create index bonus_acct_ participated_ix on bonus(account_number, participated)
Because the account_number column is more selective than the participated column, combining these two columns yields a more selective and efficient index. You can now use this index to search for and sort by account_number for all customers who have participated in the bonus plan. (Don't forget that to use a composite index in filtering or sorting data, your SQL statement must always specify columns in a table from left to right.)
Tip 5: Use the update statistics Command
To process indexed database operations as efficiently as possible, SQL Server keeps track of index key values, including the location of index key values and the values themselves. To do so, SQL Server runs the update statistics command when you create or re-create an index.
If you expect your key values to change over time, periodically rerun this important command. If your key values change and you don't refresh the key distribution statistics by running update statistics, the optimizer can make a wrong query-processing decision based on this old information. Although old key distribution statistics won't affect the data the engine returns, they can substantially slow system performance.
Tip 6: Prevent Long Transactions
SQL Server uses the transaction log to track alterations to data. In addition to protecting your data, the transaction log has an important effect on database speed. One crucial transaction log performance issue is the length of your transactions. Keep your transactions as brief as possible because long, open transactions can wreak havoc on database application performance.
Suppose you've developed a client support application for a large insurance company. This application lets customer service reps give customers information about their policies over the phone. One day, a rep is working with a customer who has reported a billing error. The rep pulls up the application's payment record screen to modify the customer's record. After the phone call, the rep goes to lunch, leaving the customer's record on screen to complete it later. If the client-support application resembles the following pseudocode example, you have a potential performance problem.
get the customer's name and account number begin transaction build up a query retrieve the record display the record prompt the user for changes update the record check for errors if no errors commit transaction else rollback transaction
If the rep takes a break after pulling the record but before changing it, a long transaction can result. Because SQL Server can reclaim transaction log space for only finished transactions, the transaction log can't be freed until this transaction is complete. And all data modification operations will probably come to a grinding halt until the log can restore its empty space.
Modifying the above example to avoid the long transaction problem produces the following pseudocode:
get the customer's name and account number build up a query retrieve the record display the record prompt the user for changes begin transaction update the record check for errors if no errors commit transaction else rollback transaction
By beginning the transaction only after a user enters changes to the record, you eliminate the potential for a long transaction. You can add some logic to the code to ensure that another user doesn't change the record's contents between the time you select the record and when you update it.
Tip 7: Place the Transaction Log on a Separate Device
SQL Server lets you choose where to store the transaction log. For most read and write applications, the transaction log is a heavily accessed disk structure because it is at the core of SQL Server's data integrity architecture. Because I/O is often the bottleneck on today's fast systems, spreading the I/O load among as many devices as possible is a good idea. To accelerate I/O performance, consider placing the transaction log on a different physical device from the database. Another reason to put the transaction log on a different device is so that you can periodically perform transaction log dumps without having to back up the data.
Tip 8: Use One Large Database
SQL Server developers and database administrators frequently make the mistake of dividing information into several small databases, rather than storing the data in one large database. Why is this a problem? One way SQL Server speeds processing is to use its memory buffers (i.e., cache) to reduce disk I/O. These memory buffers hold data, indexes, and stored procedures. When a SQL Server application must access one of these resources, the engine first looks in the appropriate memory cache area. On a busy system, many such requests are satisfied from memory rather than from disk. Reducing the number of times the engine must make a costly disk inquiry accelerates access.
If you divide your data into many small databases, chances are users can't share much of the information stored in memory, which leads to degraded performance. For example, suppose you maintain a manufacturing system that supports your firm's worldwide operations and resides on one large server. You divide your data into separate databases, according to geography--for example, WestEurope, EastEurope, WestNorthAmerica--to let users access data for only their region. As part of this strategy, you duplicate many lookup tables in the small databases.
The combination of separate, small databases and duplicated lookup tables greatly reduces the potential for memory caching. Although users can be examining identical copies of lookup tables in each database, the database engine can't take advantage of memory caching across databases.
For example, what happens when Ted looks for part number 20993 in the WestNorthAmerica parts table and Alex must find the same part in the EastEurope database? Ted pulls the appropriate page into memory, yet Alex won't see that page: Although Alex is reading a row on the same page, he retrieves it from a different database. If one large database holds this data, Alex's process doesn't have to issue a disk read request because the page is already in memory. If you multiply the number of unnecessary read requests by the number of users, you can see how performance can suffer.
Lost caching opportunities also affect stored procedures. Besides data, users can share stored procedures (although the engine must generate query plans if the procedure is already in use).
Although you sometimes need multiple small databases, try to use one large database. To limit data access to appropriate users, you can choose among several methods, such as key design and views.
Tip 9: Index Temporary Tables
Temporary tables (work tables) are transitory storage locations that SQL Server uses to satisfy processing demands. SQL Server and you can create these tables.
Often the engine can build an implicit temporary table. For example, SQL Server can create temporary tables when it can't use available indexes to process information. When SQL Server automatically creates work tables, aside from reworking your queries or altering database index structure, you don't have much control over these work tables. For example, you can't create indexes on them, but SQL Server often builds indexes on the key columns of implicitly created work tables.
You can specify the create table command to build explicit temporary tables to hold data until the application finishes. When you explicitly create a temporary table, you largely control how that table is indexed, if at all. If you expect the work table to contain large amounts of data, you can create indexes where appropriate. Doing so helps reduce the time needed to locate information in your temporary table, which should lead to better application response.
Tip 10: Control the Amount of Published Data
SQL Server provides powerful replication features as part of its core capabilities. When you use these features, following good techniques to replicate information is important. For example, just because you can replicate a specific table doesn't mean you should. Sometimes, administrators replicate unnecessary tables. This replication increases the work for all machines in your environment, adds to your network's burdens, and enlarges the amount of disk space necessary on subscription machines.
Before you add a table to a publication, decide whether to replicate the table. If you do, you then have to decide whether to replicate horizontally or vertically. Either operation will reduce the amount of information SQL Server passes among systems.
For example, to replicate an account table that has several million rows, you must decide whether this table is a good candidate for replication: Do multiple systems need to see the table at the same time? If the answer is yes, you have to decide whether all subscription servers will want all the information in the table. Chances are you can subdivide this data horizontally (across rows) or vertically (across columns). Taking these extra analysis steps can significantly improve performance, because you can sharply reduce the amount of information flowing among your systems.
You can immediately start improving SQL Server performance by applying any of these techniques. For in-depth information about the tips in this article and other performance-enhancing ideas, see the forthcoming Microsoft SQL Server: Designing and Building a High Performance Database (Prentice Hall, Upper Saddle River, NJ, November 1996, ISBN 0-132-66222-1).