Member Properties Revisited

Downloads
15797.zip

Use member properties to explore and analyze data

Member properties are attributes of dimension members. You define the names and types of member properties for each dimension level, and the member property's value varies for each dimension member. For example, if you have a Customer dimension organized by country, state, city, and name, the customer name level might have the member properties gender, age, education level, and income level. The city level might have member properties such as mayor, population, and average temperature. (See "Member Properties," August 1999, for details about OLAP member properties.)

A cube might have several properties describing dimension members; these member properties can be useful during analysis. For example, you might want to use member properties when you run an OLAP query such as "Which male customers with at least a bachelor's degree spent the most money?" To answer this question through the Customer dimension I described, you need to be able to filter the customers based on their member property values. You can use MDX to write a query that filters the customers. Listing 1 shows an example of such an MDX query, which you can execute against the FoodMart Sales cube. Figure 1 shows the query results. (Note that the FoodMart database doesn't always use traditional names for male customers.)

You can also use member properties to answer this query by creating virtual dimensions for each property you want to use for analysis. Virtual dimensions are an OLAP Services (and Analysis Services) feature that creates a two-level dimension that makes all individual values of a member property available as dimension members. This feature is particularly helpful when an OLAP client application doesn't let you filter selections based on member property values. With virtual dimensions, you can select the property values as dimension members, then eliminate all the empty results. By eliminating the empty results, you effectively filter the data on the selected member properties values. For example, the FoodMart database has made the Education and Gender properties available with virtual dimensions. If you want to know which customers have a partial high school education, you can select the partial high school education member from the education level virtual dimension, then configure your client application to eliminate (filter) empty customers. Only the customers with this education level will be non-empty. You can use this technique to achieve the same result as with the query in Listing 1. For example, the query in Listing 2, page 80, is functionally equivalent to the query in Listing 1 except that Listing 2 uses virtual dimensions. Also, note that virtual dimensions are optimized in SQL Server 2000 Analysis Services, so the query in Listing 2 is much more efficient than the query in Listing 1 when you run them both in Analysis Services.

Member Properties and Large Cubes

Virtual dimensions work well when you want to analyze a handful of member properties, but what if you have 100 or more member properties? Most OLAP client applications aren't designed to deal with 100 dimensions, and for good reason. Infrequent analysis users have a hard enough time understanding a four-dimensional cube, let alone one with 100 dimensions. Also, a technical problem limits most cubes to a handful of dimensions. Larger cubes have too many cells to store measure values on any reasonably sized hard disk. A cube cell is defined by a unique combination of dimension members—one from each dimension—so a cube with 100 dimensions, each with only 10 members, would have 10¹⁰⁰ possible cube cells. Run that on your pocket calculator and see what you get!

I recently encountered a cube that had 140 member properties. The properties—such as brand, size, color, and weight—described the product members of a product dimension. This cube had hundreds of thousands of products, but the product properties were sparsely populated because only a handful of the properties applied to any given product. For example, a candy bar has a brand, size, and weight but usually not a color.

Another aspect of this cube is common in member properties: One member property might have different sets of values for different classes of products. For example, sizes of car tires are different from sizes of children's shoes. If you're using the cube to find the best-selling sizes of children's shoes, you don't want to view all the tire sizes, too. As you can imagine, if you created a virtual dimension on the size member property, you would have many dissimilar dimension members. So you would want to filter out the dimension members that don't apply to the analysis you're doing.

If you were the DBA or consultant asked to design an analysis solution based on this cube, what would you do? The users would probably want to run queries that return the different values of member properties on the columns and rows of the results. For example, they might want product colors on the rows and product sizes on the columns. To return results for each member property value, you must make sure the values are dimension members. So you need to create a virtual dimension or a real dimension as part of the cube, or you need to create a calculated member of a dimension for each member property value.

Because of the problems with large cubes, you wouldn't want to create a virtual dimension for every member property. What you need is a virtual dimension for just the member properties that apply to the class of products a user is analyzing at any given time. You want to let the user create and drop virtual dimensions on the fly. You're waiting for me to tell you about the super-secret MDX command that lets you do this, right? If one exists, I don't know about it! But here are a couple of techniques that the developers in this situation used.

To solve the problem of needing to create dimensions on the fly, the developers created two extra (empty) dimensions in the cube. Each of these dimensions has only one dimension member, the ALL member. When a user wants to analyze by product size, for example, the application fills one of these empty dimensions with calculated members for each relevant product size. When the user is finished analyzing by product size, the calculated members are dropped and the dimension becomes available for analyzing another member property. This solution works well except for a couple of performance drains. One is the effort of finding the possible member property values for use in the calculated members; the other is the penalty you incur when running queries using the newly created calculated members.

Finding Member Property Values

One way to discover the member property values is to run an MDX query that returns all the products and their member property values. Then you can scan this list to determine the unique member property values. You can use a couple of techniques to formulate an MDX query to return member property values. The first technique, which I used in Listings 1 and 2 with the Ed Level member, involves creating a new calculated member that returns the member property for each dimension member. Another technique is to use the DIMENSION PROPERTIES keyword. The following query demonstrates how to use this keyword to return the education levels for the customers in California:

SELECT \{ \[Unit Sales\] \} ON columns,
  Descendants(\[Customers\].\[All 
  Customers\].\[USA\].\[CA\], \[Customers\].\[Name\])
  DIMENSION PROPERTIES Customers.Education ON rows
FROM Sales

A query that uses this keyword returns the dimension properties as part of the row or column headers instead of inside the query result cells. (If you use the MDX sample program to run the query, you won't see the property values because the MDX sample program doesn't support displaying this type of result.) For scanning purposes, sorting the list by the member property values would be convenient because such a sort would make eliminating duplicates in the retrieved member property values much easier. Unfortunately, with Analysis Services, I was unable to get Pivot Table Services to order the results by the member property. Under Analysis Services, the query in Listing 3 fails because it can't find the Education property; this appears to be a bug in Pivot Table Services because the same query works fine without the Order function.

If you were using SQL instead of MDX, the need to find unique member property values would be an obvious occasion to use a GROUP BY clause, but MDX doesn't have an equivalent function. However, this idea brings up a valid non-MDX approach: You could establish a normal (non-multidimensional) ADO connection to the underlying star schema tables and run an SQL query containing a GROUP BY clause to determine the unique member property values.

Fetching all leaf-level items in the dimension and scanning their property values can be slow because the query accesses a lot of data. Also, if you run an MDX query such as the one in Listing 1, you pay a performance price because internally, Pivot Table Services must scan the dimension's leaf level to find the relevant dimension members.

If you're serious about analyzing by member properties, here's a solution that will cure this performance problem: Construct in the cube a new dimension that has all the member properties and their values. You need to create a table in your star schema that has one column for the member property name and one column for the member property value, as Figure 2 shows. This table also needs a key to tie the records back to the appropriate dimension table records. In the FoodMart example, you'd need to tie the member property table back to the Customers table. With this new table, you can construct a new two-level property dimension. The top level has dimension members with the property names, and the leaf level has the member property values.

This new property dimension makes running queries based on member properties extremely fast because the dimension is a regular OLAP dimension. You don't incur any penalties for the dimension being virtual or for any on-the-fly filtering. The downside to this solution is that it is only one dimension; you can't use it to create MDX queries that have one member property on columns and another on rows because a single dimension can't be on two axes.

Despite this limitation, the property dimension is useful and lets you quickly determine the unique member properties for a subset of another dimension. For example, if you want to determine the unique values of education level in the FoodMart cube, you can simply select the non-empty children of the Education dimension member in the Property dimension.

Another approach to the problem of using many member properties for analysis is to use a combination of the solutions I described. You could create one empty dimension to be populated with calculated members for each value of a member property, then create a property dimension with all the member property values. This approach would let you look at any combination of two member properties in the same query on different axes (rows or columns). You could also use the property dimension to quickly determine the unique member property values needed to create calculated members for the empty dimension.

Most OLAP cubes involve dimensions that should (or do) have member properties because most business entities have several attributes that describe them. For example, products and customers have many characteristics besides their names and the attribute you use to organize the members in their dimension (e.g., FoodMart organizes customers by geographic region). Because all member properties are important, a good analysis solution must offer ways of using these properties to explore the data. A good understanding of member properties can help you provide excellent solutions for property analysis.

Comments

Plain text