Answers from Microsoft - 20 Dec 2000

Editor's Note: Send your SQL Server questions to Richard Waymire, program manager for SQL Server development at Microsoft, at [email protected]

When the SQL Server 7.0 query optimizer examines the distribution statistics of a composite index, does the query optimizer consider the statistics for each column on the index or just the statistics for the first column (assuming that this table has only index statistics, not column statistics)? If the optimizer considers statistics for each column in the index, could the optimizer choose an index because the last column on that index has favorable statistics? Consider the following example (assuming the table contains data):

CREATE TABLE test_table
(entry_date   datetime,
row_id        int,
first_name    char(30),
last_name     char(30))
CREATE INDEX test_table_idx on test_table
   (last_name, first_name, row_id)
FROM test_table
WHERE row_id=10

If the query optimizer checks the distribution statistics for each column in the composite index, will SQL Server possibly choose to use the table's index to execute this query?

SQL Server has a histogram only on the first column of the multicolumn index and has density information for all initial prefixes. The example you provide has three densities: one for last_name; one for the combination of last_name and first_name; and one for the last_name, first_name, and row_id combination. The server might still choose to scan the index rather than the table, not because of the statistics on the row_id column but because scanning the index costs fewer I/Os. In addition to scanning the table on the shown index, the optimizer investigates other indexes that exist on the table. The most favorable index for the query would contain the row_id as the first column and would contain the last_name and first_name columns. If the row_id is unique (or almost unique), another good plan is to have a single-column index only for row_id. In such a case, the index seek would be followed by a fetch of the corresponding first_name, last_name column values from the base table, with or without a clustered index.

What's the difference between the all density and density values that SQL Server 6.5's Database Consistency Checker (DBCC) reports?

The value density reflects only the first column of the index, and all density reflects all columns of the index. In SQL Server 7.0 and 6.5, the density value reflects the density of the index's leading prefix. This particular density is used to cap the error induced by estimation; the error is about the same as the average estimation error. If only one value exists for the attribute in all the tuples, the attribute density is 1. If one value occurs in half and two other values each occur in one quarter, density will be between one quarter and one half. For uniform distributions, the selectivity of a predicate on a column is a measure of the density. Note that in case of high skew, the error induced tends toward the maximum boundary, which is fairly high.

The value for all density is the number of distinct values for a given combination of columns expressed as a fraction. SQL Server uses this information to determine the number of distinct values (distinct operators) or GROUP BY operations. SQL Server uses all density values in correlation analysis, too.

When I try to create and save a repository package from my machine, I get an error reporting that the COM server hasn't been registered on my system. How can I fix this problem?

To fix this COM server registry problem, you need to reregister the file repodbc.dll. Run

regsvr32 RepODBC.DLL

on your server from a command prompt. You'll find this file in your \mssql7\binn directory.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.