Multidimensional Expression (MDX) is a query language for OLAP databases. MDX programmers strive to create calculated members that are simple yet powerful and easy to maintain, but sometimes those members have poor query performance. In this article, Paul Goldy describes why performance problems can occur and how to work around them by walking you through an example involving a consulting company that uses headcount as its metric.
MDX programmers like calculations that are elegant (i.e., a solution that is simple yet powerful and easy to maintain) and have good query performance. However, it’s not uncommon for MDX programmers to create a calculated member whose business definition and initial implementation are elegant, only to find out later that the member is so slow that it’s unusable in queries against a full data set.
To see why performance problems can occur and ways to work around those problems, let’s look at an example involving a consulting company that uses headcount as its metric because it has no other capital assets to track. This consulting company tracks its consultant headcount by tracking the number of hires, terminations, and transfers. The consultant “inventory” is derived into \[Begin Count\] and \[End Count\]. Nearly all other metrics, such as full-time equivalent (FTE) utilization and average cost per FTE, rely on the beginning and ending headcounts. After providing the business definition, I’ll show you one solution that is elegant but has poor performance and another solution that isn’t as elegant but has acceptable performance.
The business definition is
\[End Count\] = \[Begin Count\] +
\[Termination Count\] +
\[Hire Count\] + \[Transfer Count\]
\[Begin Count\] = \[End Count\]
of the prior period
In this definition, the \[End Count\] value of one period becomes the \[Begin Count\] value for the next period. For example, look at the sample data in Figure 1. Week 22’s \[End Count\] value is 120, which becomes Week 23’s \[Begin Count\] value. The \[Termination Count\], \[Hire Count\], and \[Transfer Count\] values are loaded measures.
One elegant solution is to take advantage of the Prev- Member function in MDX. Listing 1 shows the \[Begin Count\] definition for this solution. If you use this definition with the data in Figure 1, Week 24’s \[Begin Count\] value of 130 is Week 23’s \[End Count\] value. Similarly, Week 23’s \[Begin Count\] value of 120 is Week 22’s \[End Count\] value. The same pattern would be followed if there were more data.
Callout A in Listing 1 highlights the \[End Count\] definition for solution 1. Referencing \[Begin Count\] within the definition of \[End Count\] introduces a recursive definition in which every \[End Count\] calculation uses \[Begin Count\], which in turn references \[End Count\].
If you’re familiar with recursive solutions (e.g., Tower of Hanoi), you likely noticed that the \[Begin Count\] definition doesn’t include an endpoint to end the recursion. Recursive solutions typically have a check that tells the recursion to quit and return. Instead of an explicit exit criterion, the \[Begin Count\] and \[End Count\] definitions in Listing 1 rely on an implied recursion endpoint from Microsoft SQL Server 2005 Analysis Services. The implied recursion endpoint comes from the code
The PrevMember function eventually references a date outside the cube space. When the tuple
references a \[Calendars\].\[Calendars\] hierarchy member outside the cube space, the \[Calendars\].\[Calendars\].CurrentMember. PrevMember returns NULL, which ends the recursion.
For example, suppose we have only five consecutive weeks of data, starting with Week 20, for a division in the consulting company. In Week 20, 100 employees were transferred into the division and 5 new employees were hired. Week 20’s \[Begin Count\] value is NULL because it references PrevMember, which doesn’t exist. If we could view the \[Begin Count\] definition while Week 20 is the CurrentMember, the tuple would look like
There is no previous member for Week 20, which results in a NULL value for the tuple from Analysis Services. The performance of \[Begin Count\] can get very slow, even when there’s only a small number of members to iterate through. For instance, suppose we have 12 years of data and a calendar hierarchy of year-quarter-month-day, which means we have 4,380 (12 × 365) day-level members. On a two-processor 2GB development server, solution 1 can take up to 30 minutes to return \[Begin Count\], as defined in Listing 1, when Calendar.Current- Member is a day-level member.
A modified version of solution 1, solution 2 takes advantage of the non-additive nature of \[Begin Count\]. In solution 2, \[Begin Count\] is the same value for the first member in a period, regardless of which level the member belongs to in the calendar hierarchy. For example, as Figure 2 shows, if \[Begin Count\] is 150 at the year level (2006), \[Begin Count\] is the same value at the quarter level (Q1-2006), month level (Jan-06), and day level (1-Jan-06).
In Solution 2, a member's parent has the same \[Begin Count\] value as the member when the member is the first sibling of the parent. The \[Begin Count\] definition is shown in callout A in Listing 2. This definition includes an IIF statement, which works as follows: If Calendar.CurrentMember is the first sibling, then we use \[Begin Count\] with Calendar.CurrentMember .Parent. Otherwise, we use \[End Count\] with Calendar .CurrentMember.PrevMember.
Solution 2's \[End Count\] is also defined with an IIF function, as Listing 2 shows. Here's how that IIF function works: If Calendar.CurrentMember.Parent is the highest valid point in the calendar hierarchy, then we add up the hire, termination, and transfer counts related to the \[Fiscal\] member, and that result becomes the \[End Count\] value. This is the end point of the recursion. Otherwise, we add up the begin, termination, hire, and transfer counts, and that result becomes the \[End Count\] value. The reference to \[Begin Count\] forces us to iterate (via recursion) back to the \[Begin Count\] definition.
Using the technique of looking for the parents of first siblings lets us evaluate \[Begin Count\] with far fewer recursive calls than if we use the recursive technique in solution 1. The reduction in the number of recursive calls is what makes solution 2 perform significantly better than solution 1.
—Paul Goldy, Technical Analyst, Wirestone