The Zen of MDX

Progress through the levels of MDX mastery

Download the Code iconIf you're new to SQL Server's OLAP Services, you might not be familiar with Multidimensional Expression (MDX) language. MDX is to multidimensional cubes as SQL is to relational tables, so OLAP Services uses MDX as its query language. MDX is a nontrivial language, but creating simple queries in MDX is no more difficult than creating basic queries with SQL. However, when you want a query to return values derived from the values in your cube, MDX becomes significantly more complex. Writing and understanding MDX formulas is more complicated than performing similar tasks in SQL because a cube is n-dimensional. Add the fact that each dimension is hierarchical, not linear, and wrapping your mind around the problem can take a while.

I've discovered multiple levels to understanding MDX. The first level is basic data retrieval. Here, you want to query your OLAP cube and return simple 1- or 2-dimensional results of information in rows and columns. The values you want to display are stored in the cube. To retrieve the data, you must know the keywords, punctuation, and basic syntax rules of MDX. If you've worked with another query language, such as SQL, learning these rules isn't difficult because basic MDX queries avoid the complexities of the data's multidimensional structure. With this first level of understanding, I estimate that you can harness about 10 percent of MDX's power. Here's an example of a level-one query that retrieves sales information for two quarters in 1997 for the USA and Canada:

SELECT \{\[USA\], \[Canada\]\} ON COLUMNS,
  \{\[1997\].\[Q1\], \[1997\].\[Q2\]\} ON ROWS
FROM Sales

The second level of understanding MDX enables you to create MDX formulas. In MDX terminology, these formulas create calculated members. The result of an MDX query is n-dimensional, with columns and rows as the first two dimensions. Within the MDX query, each dimension is an axis. When you formulate an MDX query, each position (coordinate) on a resulting axis must contain a dimension member. If you want one of the columns or rows in your result to be a derived number, you need to create a new dimension member to use in your query. For example, if you want to retrieve the difference between the sales for first quarter 1998 and first quarter 1997 (year-to-year growth), create a new dimension member with a name and a formula, as in the following query:

WITH MEMBER \[Time\].\[1998\].\[Q1 Change\] as '\[1998\].\[Q1\] ­ \[1997\].\[Q1\]'
SELECT  \{\[USA\], \[Canada\]\} ON COLUMNS,
  \{\[Q1 Change\]\} ON ROWS
FROM Sales
WHERE (\[Unit Sales\])

This query works if you're designing a report to run against OLAP data. Of course, after adding 1999 data to the cube, you'll need to change the query if you want it to return the difference between 1999 and 1998 sales, but the change is as simple as substituting the updated years. With an understanding of the available MDX functions and how to create calculated members, you can tap into most of MDX's power (at least 60 percent). Level-two MDX lets you retrieve a specific value from a cube, which might generate online information to replace a hard-copy report that your company has been generating for the past 20 years.

I believe you progress to the third level of MDX understanding when you feel comfortable with the n-dimensional structure of a cube and you design your MDX formulas for use in an OLAP browsing environment. Truly useful MDX formulas let users view familiar business-analysis calculations without reinventing the formula for every query. In other words, level-three MDX formulas are useful for viewing a large percentage of the cube data, not just for generating one particular query. OLAP is about online, interactive browsing of information. Level-three MDX can help you and your users find relevant trends and exceptions in your business data.

An analogy of the difference between levels two and three is the difference between programming and software engineering. Pro-gramming without software engineering focuses on near-term objectives and doesn't allow for known or unknown future plans. Software engineering teaches that code maintenance, architecture, and reuse are important, as is the idea of abstraction. One objective in software engineering is to build and use a collection of software modules that let the programmer deal with the programming problem at the level of the problem. The basic building blocks must match the terminology of the problem. For example, if you're implementing an automobile traffic simulation, you want the basic building blocks to be vehicles, traffic lights, and intersections. You don't want to start with numbers and alphanumeric strings. Level-three MDX builds on level two by adding reuse and abstraction.

Listing 1 contains an example of a level-three query. This query has a reusable formula for the annual growth of unit sales. The formula is reusable because it returns annual growth for any member of the Time dimension. It is abstract because it returns annual sales growth, a value used frequently in financial analysis. If you were analyzing medical treatments in a hospital, you might want a formula for bed days of care. If you were analyzing products in a retail business, you might want to find sales per square foot or profit per square foot. Good MDX formulas raise the level of data abstraction to the level of the business problem your users are analyzing. Herein lies the Zen of MDX. Create a cube with useful calculated members such as these, and you'll be an MDX Zen master! I estimate that this third level of understanding gives you access to 90 percent of MDX's power. I'm leaving out 10 percent because a fourth level of understanding might exist. As soon as I discover it, I'll be sure to write about it.

Master the Techniques

A closer inspection of the Unit Sales Annual Growth formula (in Listing 1) reveals three MDX techniques that make the formula reusable. First, the growth formula in the Measures dimension gives you the most flexibility to combine the growth formula with members of other dimensions. Second, conditional logic changes the formula when you select Time dimension members at the year level. Third, the ParallelPeriod function generically handles all Time dimension members at levels below year.

The first technique is an important one. You can choose any dimension to put the calculated member in. For example, I could have created a new member in the Time dimension, as in the level-two query example, instead of in the Measures dimension. In deciding which dimension to use, consider the flexibility trade-offs. When you create the member in the Time dimension, you have the freedom to change which measure you select in the query. Without changing the member definition, you can retrieve annual growth for Unit Sales and for Store Cost or any other measure, just by changing the measure in the WHERE clause.

In Listing 1, I chose to create the new member in the Measures dimension. Doing so locked the formula into using the Unit Sales measure, but you can change the query to display any member in the Time dimension without changing the formula. In this example, I think the level-three formula is more flexible because the FoodMart Sales cube has relatively few measures and many Time dimension members. An even more flexible solution might be to create an empty dimension and add several calculated members to it for the growth measure variations you want to use (e.g., Year-To-Year Absolute Change, Year-To-Year Percent Change, Year-To-Year YTD Absolute Change). With the definitions in another dimension, you could match these definitions with any combination of a measure and a Time dimension member.

The second technique uses conditional logic to change the formula based on the dimension member's level in the dimension. This technique is common, particularly when you're dealing with time series functions. If you want a calculated member to return YTD values, you might include special handling for a formula with a year selected. In the Unit Sales Annual Growth measure, I used Time.Current-Member.Level.Ordinal to determine whether the currently selected member in the Time dimension was at the year level. The level's ordinal is a zero-based number that starts at the top of the hierarchy and increases as you descend. The year is the topmost level in the Sales cube's Time dimension, so the year's ordinal is zero. Another way to conditionally check levels is to compare against the level name. For example:

iif( Time.CurrentMember.Level.Name = "Year",
,  )

The third technique of reusability uses the ParallelPeriod function to generically handle Time members at different levels of the Time dimension. The ParallelPeriod function calculates the distance from the year level and finds the matching parallel period at the same relative depth in the hierarchy. The Cousin function has the same kind of functionality but isn't specific to the Time dimension.

Many MDX functions handle boundary conditions well and don't require special conditional logic. In this example:

Descendants(Time.CurrentMember, \[Month\])

the Descendants function works as you would expect for all three levels of the Time dimension. It returns the corresponding months for the current member even if the current Time member is a month. If Time.CurrentMember is at the Month level, the Descendants function returns Time.CurrentMember.

Remember that the most valuable calculated members are those that enable the business analyst to use the calculated members intuitively in various situations. Designing a cube and the calculated members to go with it is similar to design work in other disciplines; you want the result to be forward-thinking, flexible, and easy to maintain. And most of all, you want the names and business meaning of your formulas to be familiar to your users. When you have harnessed this level of MDX, you will have transcended to the next plane of existence and reached the Zen of MDX! (For answers to the March MDX puzzle, see "March MDX Puzzle Solution Revealed." For the April puzzle, see "MDX Puzzle.")

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.