Optimize Analysis Services

Manipulate connection properties in PivotTable Service to improve query performance

SQL Server 2000 Analysis Services performance affects every analysis application. In "OLAP Performance," January 2000, InstantDoc ID 7638, I covered query performance and performance considerations for building cubes. This month, let's take a closer look at how to optimize the performance of OLAP queries.

To understand Analysis Services performance, you need to understand Analysis Services' client/server architecture and how Analysis Services executes queries. The client component of Analysis Services is PivotTable Service, which is a database driver—a nonvisual software component that supports the programmer's interface OLE DB for OLAP. OLAP applications such as Microsoft Excel use PivotTable Service to communicate with the server component of Analysis Services.

PivotTable Service performs most of the work in executing a query. Logically, because PivotTable Service can perform the same queries against an OLAP cube that's stored locally on your hard disk as it can perform against a server-based cube, PivotTable Service must have the functionality to execute OLAP queries.

To submit OLAP queries to PivotTable Service through the OLE DB for OLAP interface, you use MDX, the SQL-like expression language that Analysis Services uses to retrieve multidimensional data. PivotTable Service then goes through five steps to execute the query: parsing, prebinding, binding, execution, and cell retrieval. You can categorize these steps into two phases: axis resolution (the first four steps) and cell retrieval (the last step). Axis resolution is the phase in which PivotTable Service determines the dimension members that will be on the axes (e.g., rows, columns, pages) of the query. Table 1 shows definitions of each of the four steps in this phase. Cell retrieval is the phase that determines the values (usually numeric) that go in the cells. If you think of a query result as a spreadsheet, axis resolution determines the row and column headers, and cell retrieval fills in the numbers in the cells.

The distinction between these two phases is important. When you, the developer, issue a request to PivotTable Service to execute an MDX query, PivotTable Service performs only the first phase while you wait for the execute function to return with a status value (unless Default Isolation Mode is set to true). The second phase occurs as you request the individual cell values. This two-phase execution lets you start working with the "spreadsheet" on the screen before all the results are available. This kind of optimization can significantly improve the way users perceive your application's performance. To illustrate this point, try running the following query in Analysis Services' MDX Sample Application:

SELECT \[Time\].\[Month\].members ON Columns,
  \[Customers\].\[Name\].members ON rows
FROM Sales

The MDX Sample Application takes approximately 45 seconds to complete this query on my laptop. But the application draws the grid and fills in the row and column headers in just 5 seconds. At that point, the axis resolution is complete and the sample application has begun requesting individual cell values. Unfortunately, the sample application displays the hourglass cursor for the duration of the cell retrieval. If you're developing your own application, you can program it to return to a normal mouse cursor and let the user scroll around the grid or perform other operations while the grid is filling in.

The OLE DB specification also supports asynchronous query execution, which lets you initiate the axis-resolution portion of the query without waiting for its completion. However, asynchronous query execution is difficult to implement from a client application and can be detrimental to performance. One of the main reasons you might use asynchronous query execution is so that you can cancel queries that take longer than you expect. However, PivotTable Service doesn't support true canceling of a query—after a cancel request, the asynchronous query will continue and block other PivotTable Service function requests while the query finishes.

Manipulating Connection Properties

You can manipulate some connection properties in PivotTable Service to improve query performance. Let's examine some important considerations for four of these properties: Execution Location, Large-Level Threshold, Default Isolation Mode, and Client Cache Size.

Execution Location. You can use the Execution Location property to force the execution of a query on the server or on the client. The default setting, automatic, lets PivotTable Service inspect the query to determine whether it should execute remotely on the server. Two conditions cause a query to execute remotely: the existence of a reference to a large dimension level and the existence of a filter operation within the query's MDX code. The following MDX expression on the Sales cube would meet these criteria:

TopCount(Customers.Name.Members, 10, \[Unit Sales\])

Customers.Name.Members is a reference to a large level because the number of members in the Name level (10,240) of the Customers dimension exceeds the large-level default threshold (1000). The inclusion of the filter TopCount suggests that executing the query remotely on the server would be worthwhile because the filter operation eliminates most of the large level.

In some situations, a query won't execute on the server even if the query meets the level-size and filter requirements. For example, a query might contain a function that doesn't execute correctly on the server. A client-registered user-defined function (UDF) is an example of a client-only MDX function. If the query is executed against a local cube file, the query can't execute on the server either.

Large-Level Threshold. This property sets the threshold for the number of members a dimension level must have to be considered a large level. Large dimension levels are important for a couple of reasons. First, a reference to a large level can cause a query to execute remotely on the OLAP server, as I described above. Second, schema rowsets that contain dimension metadata contain only an estimate of the number of dimension members in a large level. That estimate matches the default setting of the Large-Level Threshold.

This estimation of the number of members in a large level can cause implementation difficulties. Client applications also have thresholds for what's considered a large number of members, and these thresholds are frequently larger than Analysis Services' threshold. Because Analysis Services estimates the number of members, the application can't proactively decide whether to fetch all of a dimension member's children or all of a level's members. Thus, you can't estimate how long a query will take or how much memory it will use. This information is important because you might need to warn the user or avoid the operation. Such a situation might occur when a client application needs to fill a drop-down list with all the members from a dimension level. Or a user might expand a node in a dimension's tree representation that causes the retrieval of more than 1000 members. Or a dimension level could contain more than 1 million members.

One way to circumvent such a problem is to use MDX queries instead of schema rowset calls to fetch dimension members and to surround the MDX query with a subset function to limit the number of members the query returns, as the following example shows:

SELECT subset( Customers.Name.Members, 0, 5000 ) ON columns
FROM Sales

You might also use a schema rowset function to retrieve the members. But if your application might have to deal with many members, using an MDX query with a subset function is safer.

Default Isolation Mode. You use this property in write-back cubes, but using this property has performance implications in nonwrite-back implementations. When the Default Isolation Mode is set to true, PivotTable Service isolates the cache contents in a single query. PivotTable Service flushes the cache before a query starts, then fills the cache with the results before retrieving them so that every read of a cell returns the same value. You want to make sure that subsequent reads of the same query results don't change because of other users' writes, even if the underlying cube changes because other users write new values. However, you do want to receive other users' updates, so PivotTable Service flushes the cache between queries.

If Default Isolation Mode is set to false, PivotTable Service preserves the cache between queries. But when the cache exceeds its available memory, PivotTable Service flushes a portion of the cache and might not preserve the read integrity of the current query.

A side effect of this property is that it can improve performance when a query executes remotely on the server. PivotTable Service retrieves and locks in memory the cell values in the query result, which reduces the number of round-trips to the server.

Client Cache Size. This property controls how much client memory is available as a PivotTable Service cache. You can set the value as a percentage of available memory or as a fixed number of kilobytes. For details about the setting, see SQL Server Books Online (BOL).

You should know a couple of things about the PivotTable Service cache. First, the cache is database-specific. As long as you keep the OLE DB property for the current database constant, the cache can grow. However, if you change the current database, PivotTable Service will flush the cache. Also, when PivotTable Service runs out of cache memory, it will try to flush from the cache all the information associated with cubes other than the one you're accessing. If the cube your query is accessing is a virtual cube, then PivotTable Service might flush the cache memory of one of the underlying base cubes before another base cube, thus disrupting the data.

Second, if you have the Client Cache Size property set even a little smaller than the total amount of memory required for the current query, you can destroy the performance of the cache. In such a situation, if the Default Isolation Mode is set to false, then PivotTable Service reads in the cell values during the query execution and the cell retrieval steps, which causes PivotTable Service to constantly flush the cache and make extra round-trips to the server.

Finally, you need to understand how PivotTable Service makes requests to the server and stores them in the cache. PivotTable Service has a query-optimization component that predicts the cell values that a query needs to retrieve from the server. This query-optimization component creates cell value requests to the server that contain two pieces of information: One is a tuple (a particular cell in the cube), and the other is the distance (in dimension levels) to retrieve descendants from that tuple for each dimension in the cube. The server returns the non-NULL cell values in that slice of the cube.

PivotTable Service caches both the server request and the result. Later, when PivotTable Service checks a new cell reference against the cache, it first determines whether the new cell reference is within the bounds of any of the cached server requests. If the reference is within these bounds, PivotTable Service checks the cell value cache. If no cell value is in the cache, then PivotTable Service knows the value is NULL because it stores only non-NULL values.

You have an advantage when you understand the inner workings of each software component you use. Knowing how to set the execution location, large-level threshold, isolation mode, and cache size properties is useful when you're trying to optimize the performance of queries running against cubes larger than 500MB or when you're trying to minimize query time in a cube to improve usability or scalability. I hope some of the details about PivotTable Service that I provide can help you in your next Analysis Services implementation. Thanks to Cristian Petculescu from Microsoft's Analysis Services development team for providing much of the information in this column.

Comments

Plain text