Q: What does the UPDATE STATISTICS command do?
In SQL Server 4.21 and 6.x, UPDATE STATISTICS ensures that SQL Server has the necessary information to evaluate which indexes and query-processing strategies to use to process a query. To determine the optimal query-processing plan, SQL Server needs to know how many rows from every table the query contains. Let's look at the following example query:
SELECT * FROM BigTable WHERE IdCol > 9,999
BigTable contains 10,000 rows, and you have a unique nonclustered index on the IdCol column. The IdCol column's values range from 1 to 10,000. UPDATE STATISTICS produces a row-distribution sampling that tells SQL Server that only one row meets the query criterion. UPDATE STATISTICS also tells SQL Server that using the index is an efficient access path. What happens if the IdCol values range from 10,001 to 20,000? In this case, the query returns 10,000 rows so SQL Server uses the index to perform a table scan, which isn't the optimal query-processing plan. This trivial example shows you why the optimizer needs accurate information about how many rows a query will return to determine the best query-processing plan.
In versions before SQL Server 7.0, SQL Server builds index statistics when you first create the index, but SQL Server doesn't dynamically maintain these statistics. Database administrators (DBAs) must regularly run UPDATE STATISTICS to force SQL Server to update the system catalog (e.g., the sysindexes table). Otherwise, SQL Server's query optimizer uses inaccurate information to determine the best query-processing plan.
To solve this problem, SQL Server 7.0 automatically detects when it needs to rebuild statistics and resamples the database to maintain up-to-date statistics. How? SQL Server 7.0 uses a complex algorithm it bases on table size and activity and another algorithm that controls how SQL Server samples a database. To examine how SQL Server 7.0's query processor runs a query, go to the SQL Server Query Analyzer's Plan tab or run a Transact SQL (T-SQL) SHOWPLAN statement. To override SQL Server 7.0's automatic statistics-update feature, disable the auto create statistics option via sp_dboption. For more information about how SQL Server uses statistics, see Books Online (BOL) "Statistical Information," "Distribution Statistics," and "Execution Plan Caching and Reuse." In addition, you can experiment with the DBCC SHOW_STATISTICS command.
Q: Does SQL Server 7.0 Profiler adversely affect performance?
SQL Server 7.0's Profiler is the grown-up version of SQL Trace, which Microsoft introduced in SQL Server 6.0. You can use Profiler to profile dozens of specific and valuable events, but be careful not to trace so many events that you degrade performance. Fortunately, profiling an event doesn't cost anything (i.e., you don't use resources and degrade performance) if the event never happens. For example, if you set up a profile for deadlock events and a deadlock never happens, the profile costs your system nothing. Thus, you can create a specific profile that is always run and is almost cost-free. We suggest that you set up Profiler to look for abnormal events that might seriously impact query or server performance, such as deadlocks, missing statistics in a query, queries that cost more than a specific threshold, and large sort and hash operations.
Q: We're trying to use SQL Server 7.0 Data Transformation Services (DTS) to populate a data warehouse, but the Microsoft OLE DB provider for Oracle doesn't work with any table that contains a column with the Oracle data type number(22). For example, in our SQL Server 7.0 database with a linked Oracle database, the following query fails:
SELECT cust FROM csti..cstimgr.sobsar_customer
The system outputs the following error message:
Server: Msg 7354, Level 16, State 1, Line 1 OLE DB provider 'MSDAORA' supplied invalid metadata for column AGE_BUCKET__1'. The data type is not supported.
Queries of tables that don't contain columns of data type number(22) succeed. Do you have any suggestions?
Many users have run into this problem when they build SQL Server 7.0-based data marts and data warehouses over existing Oracle installations. This data type incompatibility problem occurs when the Oracle column with a numeric data type doesn't specify a length (i.e., the data type has no precision and default, and it allows NULL). Oracle uses a variable-length numeric with precision up to 255 to represent a numeric data type that doesn't have a precision and scale. No SQL Server data type exists that SQL Server can map Oracle's data type to without a loss of precision. For more information about this problem, see the Microsoft article "PRB: SQL Distributed Query with Oracle NUMERIC Column Gives Error" (http://support.microsoft.com/support/kb/ articles/q221/5/52.asp). This article describes the following four workarounds:
- In the Oracle table, set the length property for the NUMERIC column.
- If you aren't retrieving the NUMERIC column, use the OPENQUERY command in your distributed query rather than a four-part name.
- Use the TO_CHAR function to convert the data in the numeric field to a character value; for example,
SELECT * FROM OPENQUERY(<mylinkedserver>, 'SELECT TO_CHAR(F1) from table1')
You might need to convert it back for applications that require math operations.
- Change the field type in the Oracle table to a defined value, such as numeric(15).
The last workaround is your best option; however, many servers have corporate restrictions that limit who can make changes and what you can change. If you can't change the data type on the server, use one of the other workarounds.
None of these workarounds impress us. In our experience, you can't use direct four-part names to access tables that use Oracle's numeric data type with no length specified. In addition, using OPENQUERY or OPENROWSET to access Oracle data requires you to convert the data to text, which means you can no longer update that data. The current workarounds aren't fully functional and present a major headache for many Oracle shops.
Q: OPENQUERY and OPENROWSET don't connect SQL Server to the Oracle server on my network. Do you have a solution?
SQL Server 7.0 ships with OLE DB drivers that let you connect to Oracle, but many people don't realize that you still need to install the standard Oracle client and networking software on the SQL Server machine.
Q: What SQL Server Profiler options can I use to figure out why my application is slow?
SQL Server Profiler is a powerful application-tuning tool, but it offers so many events that they can easily overwhelm you. To help you get started, use the Profiler wizard from the Tools menu or select Create Trace Wizard in the Enterprise Manager's wizards menu. As Screen 1 shows, this wizard provides a list of common problems the Profiler can track, such as large table scans, the cause of a deadlock, and the performance of a stored procedure.
This wizard helps you monitor and solve common tuning problems. Look closely at the individual events that these sample profiles let you monitor. Understanding why each of the events is an important component of a problem can help refine your database-tuning skills.
Q: When joining tables in a query, SQL Server 6.x can't use more than one index per table. Does SQL Sever 7.0 have this limitation?
SQL Server 6.x can't use multiple indexes for the same table in one query. For example, in the following query, assume that the database administrator (DBA) created three single-column indexes on the SupplierName, PartId, and OrderDate columns:
SELECT * FROM OrderLineItem WHERE SupplierName = "Joe's Widgets" and PartId in between 1000 and 2000 and OrderDate between "1/1/98" and "1/5/98"
SQL Server 6.x and earlier versions must pick only one index. However, SQL Server 7.0 can perform an index intersection, which lets the optimizer use all the indexes at the same time. Index intersection lets SQL Server 7.0's optimizer use more of the available query-processing strategies. You can easily optimize one query for one use, but optimizing an entire application with mixed-query needs is much more complicated. An optimal query-processing strategy for one query might not be optimal for another query. Index intersection isn't always the fastest strategy for one query, but it simplifies a DBA's decision about which set of indexes to choose to make an application perform well.
In addition, although the earliest versions of SQL Server support covered indexes, SQL Server 7.0 extends their usefulness. For example, in the following query, assume you've already created a CoveredIdx on the PartId, OrderDate, and SupplierName columns:
SELECT SupplierName FROM OrderLineItem WHERE PartId in between 1000 and 2000 and OrderDate between "1/1/98" and "1/5/98"
The index you created covered the query because one index contains all the columns the query references. What is the benefit of covered indexes? I/O is slow, so any tuning that reduces I/O improves performance. Suppose the OrderLineItem table is 300 bytes wide, but the CoveredIdx index key is only 25 bytes wide. Processing the query from the narrower covered index reduces I/O compared with grabbing columns from the 300-byte-wide table.
SQL Server 7.0 improves covered indexes by letting the optimizer choose from multiple indexes to cover a query. For example, suppose you created two indexes instead of CoveredIdx: You created Idx1 on SupplierName and Idx2 on PartId and OrderDate. SQL Server 6.5 can't simultaneously use both indexes to cover the previous query, but SQL Server 7.0 can. Thus, SQL Server 7.0's smart optimizer means you don't need to work as hard to reap the benefits of a covered query.
Q: How does SQL Server 7.0 handle data warehouse-type queries?
Large data-warehouse environments sometimes horizontally partition data based on how frequently you need the data. Suppose you're building a data warehouse to help manage sales information. You partition your data by year and store historical data for 5 years. Your table names might look like the following example:
Sales99, Sales98, Sales97, Sales96, Sales95
To physically enforce the date range the table names imply, assume that you've created column-level CHECK constraints on the SalesDate column in each table. You create a view called Sales on each of the five tables, which Listing 1 shows. What happens when an end user runs the following query?
SELECT * FROM Sales WHERE SalesDate between "6/1/97" AND "6/1/98"
SQL Server 7.0's query optimizer reduces the view's computation in the query plan to only the tables that contribute to the query's results (i.e., SQL Server 7.0's optimizer knows that the Sales95, Sales96, and Sales99 tables don't have rows that qualify, so it ignores those tables). The optimizer uses the tables' CHECK constraints to figure out that the SalesDate between "6/1/97" WHERE predicate and "6/1/98" contradict the CHECK constraint on three of the five tables in your view. Thus, no rows from those tables can qualify.
Q: Where can I find tools and information about upsizing and migrating to SQL Server from other databases (e.g., Oracle)?
Microsoft's Web site (http://www.microsoft.com) offers several white papers about migrating from databases such as Sybase and Btrieve, and you can download an Oracle to SQL Server migration white paper at http://www.microsoft.com/ sql/70/whpprs/orcltosql.htm. Oracle offers a white paper about migrating from SQL Server to Oracle at http://technet.oracle.com. Register online to access Oracle's developer network, and you can find migration white papers and other helpful tools, including the complete Oracle8i documentation set. To migrate from Access, you can use the Office 2000 Upsizing Wizard or download Access 2000, Enterprise Edition at http://www.microsoft.com/ accessdev/prodinfo/exe/wzcs97.exe.