Skip navigation

The Art of Cube Design

Use these techniques to build better forecasting cubes

Cube design is more an art than a science. Third-party applications provide many templates and patterns to help a cube designer create cubes that are appropriate for different kinds of analysis (e.g., sales or budgeting). But in the end, the cube design depends on business rules and constraints specific to your organization. What quirks of your data keep you up at night? In dimensions such as Customers or Organization, you might have a hierarchy of values that change as often as you update the cube. Or you might have members that you want to include in multiple places in a hierarchy, but you don't want to double-count the values when you aggregate those members. You can handle each of these situations in multiple ways, but which way is best? In our business intelligence (BI) development work, we see lots of problems in designing sales-forecasting cubes. Let's look at a few common cube-design problems and learn how to solve them by using some techniques that you can apply to many types of cubes.

Designing a Sales-Forecasting Cube

When creating a sales-forecasting cube, a cube designer at our company typically gets the cube dimensions from the customer relationship management (CRM) system that our sales team uses for ongoing tracking of sales deals. In our CRM system, the pipeline (the list of sales contracts that representatives are working on) puts sales deals into one of three categories: Forecast (the sales representative expects to close the deal in the current quarter), Upside (the sales representative thinks the deal will be difficult to close in the current quarter), and Other (the sales representative expects to close the deal in a future quarter). Additionally, the projection defines who has agreed that a given pipeline deal should be included in the current quarter's forecast. Each deal in the pipeline falls into one of four projection categories: Sales Rep Only (only the sales representative thinks the deal should be in the forecast and the manager has overridden the sales representative to remove the deal from the forecast), Manager Only (the manager has overridden the sales representative to include the deal in the forecast), Sales Rep & Mgr (manager and representative agree that the deal should be in the forecast), and Neither (nobody thinks the deal should be in the current quarter's forecast). A straightforward cube design might include a dimension called Projection that has a member for each deal's status and a dimension called Pipeline that has a member for each deal's category, as Figure 1 shows.

By choosing different combinations of Pipeline and Projection, you can quickly answer questions such as "Which deals in the representative's forecast did the sales manager and the sales representative both agree to?" or "Which deals in the representative's Upside category did the manager override for the current quarter?" This dimension structure also lets users view the deals if you've enabled drillthrough, so sales managers can quickly see which deals make up the forecast number they're committing to.

The problem with this dimension structure is that the Projection dimension is relevant only when the user has selected the Pipeline dimension's Forecast member. The sales representative is the only person who puts deals in the Upside and Other categories. The sales manager is responsible for agreeing or disagreeing with the sales representative's deal categorization, but the manager's input affects only the Forecast member. Table 1, page 42, shows the combination of Pipeline and Projection members that will yield data. If users choose one of the invalid combinations, they will see no data—or even wrong data. For example, if the manager selects the deals in the current quarter's forecast but doesn't select both the Sales Rep & Mgr and Manager Only projections, the projected sales number that the cube reports for the current quarter's forecast will be too low.

Providing Valid Data

One technique that would solve the wrong-data problem is the use of calculated members. You could create a calculated member on the Pipeline dimension for each valid combination of Pipeline and Projection, then hide the Projection dimension so that the manager needs to deal with only one dimension. This technique would let sales managers easily see the target that they'd committed to for the current quarter. The problem with this solution is that Analysis Services doesn't support drillthrough operations on calculated members. In a sales-forecasting application, drillthrough is a mandatory feature because you need to be able to view the individual deals in the pipeline. Without drillthrough, you lose the ability to see individual sales deals in the pipeline.

A better solution to this problem is one that you won't find documented in SQL Server Books Online (BOL). In this approach, you create one dimension that contains all the valid combinations.

Figure 2 shows the new dimension (labeled Entire Pipeline), which combines the original Pipeline and Projection dimensions into one dimension. You'll notice two things in the new dimension that weren't in the original dimensions. First, deals can appear in multiple locations in the hierarchy. For example, the deals comprising the Rep Commit member are also in the Mgmt Commit member if the manager has also committed to them. Second, the aggregation of the members to calculate their parents' values needs to use a custom rollup formula so that the aggregation doesn't count duplicated records more than once. We can solve both problems without duplicating rows in the fact table by taking advantage of the way Analysis Services joins the dimension tables together to compute cell values.

Let's look at the relationship between the fact-table entries and the Projection dimension table, which Figure 3 shows. The members of the Pipeline dimension (which we would have determined by using calculated members in the previous option) now have multiple rows in the dimension table. This structure might disprove two common assumptions about dimension tables: the assumption that the primary key in the dimension table must be unique and the assumption that a dimension member must correspond to only one row in a dimension table. Because of the way Analysis Services uses SQL to join the fact table to the dimension table when it builds the cube, neither of these assumptions is enforced. Using non-normalized tables lets us have one fact-table row that corresponds to multiple dimension members and one dimension member that corresponds to multiple categories of fact-table records. Multiple rows from the fact table have the same primary key, so those rows are included in the calculated value for that dimension member. We can calculate the correct values for every member in the dimension without increasing the size of the fact table. For very large fact tables, this technique can be a big time-saver, both when you're creating the fact table and when you're processing the cube.

Of course, when a fact-table record appears in more than one dimension member, the parents of those members won't necessarily contain the correct value. The default method of computing a member's value from its children is to sum the children's values. But summing won't work in this case because some fact-table records would be included more than once in the parent's total. The solution is to use unary operators that you associate with each member in a custom rollup calculation. The dimension table in Figure 2 shows the custom-rollup unary operators for each member in the dimension. The + unary operator means when the parent's value is calculated, the calculation should add the value to the parent member, and the ~ unary operator means the calculation should exclude the value from the parent's value. The Mgmt Commit member consists entirely of sales deals included in other dimension members, so Analysis Services ignores this member when calculating the value of its parent, Entire Pipeline. Analysis Services also needs to use a custom-rollup formula within the Mgmt Commit member because that member's value isn't the sum of its children. The Override-Excluded from Reps member is important for the manager to have available for analysis because it shows which deals the sales representative included in the forecast but the manager didn't commit to. However, these deals aren't part of the Mgmt Commit value, so Analysis Services needs to ignore them when aggregating the children of Mgmt Commit.

Now we have a cube structure that meets our needs. All the information associated with sales deals is in one dimension and the rollup formulas are computed, so the aggregate values in the dimension are correct. Because we used no calculated members, we can still enable drillthrough to see individual sales deals. When you deploy this cube to your sales team, you can be confident that the query results are accurate. You will be praised by your coworkers and be showered with gifts and money—or maybe you'll simply help your company's bottom line.

You can apply these techniques to many types of cubes. If you ever get into a situation in which you want to duplicate fact-table records in a dimension without duplicating them in the fact table, the combination of duplicating keys and using custom rollup formulas can be a great benefit.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.