Creating Dimensions in SSAS, Part 2

Executive Summary:
In the quest to improve analysis time for users needing data, we look at creating dimensions, attribute relationships, and how they help the cube engine respond faster to queries. We look at a fundamental problem with dimensions by creating a cube, then discuss how to create attribute relationships, which help to speed the data analysis process.

To better help your users with data analysis, you create proper dimensions, which we covered in Part 1 of this article. We also discussed how to process the dimensions, and we found that our two-dimensional project was showing a warning symbol next to the Time dimension name. This wasn’t caused by the overly long name, but it does indicate a fundamental problem with dimensions, which we’ll examine shortly by creating a cube. Then we’ll look at how to create attribute relationships, which tell the cube how dimension attributes relate to each other and help to speed the data analysis process.

Creating the Cube

To understand why the Time dimension is showing a warning on its one hierarchy, let’s first build a cube. Right-click the Cubes folder in the Solution Explorer and select New Cube to open the Cube Wizard. The first screen confirms that the cube will be built from a data source, while the next screen shows the list of data source views. Next, the wizard attempts to determine which tables are dimension tables and which are fact tables. Normally the wizard determines this correctly based on the joins in the underlying relational database, but sometimes you’ll have to perform minor tweaks to correctly specify the table types. In addition, the page asks for the name of the time dimension table, if one exists; I selected DimTime as the time dimension table.

The next screen shows up only if dimensions already exist — it asks developers to select any existing dimensions they want in the cube. If additional dimensions are required, the cube wizard will create them, but for this example I’ll select the two dimensions just created. The next screen shows a list of all the possible measures from the fact table. The wizard selects any field with a numeric data type, so you must remove the foreign key fields in most cases. For this example I will select only the Sales Amount field.

After completing the wizard, the cube actually shows four dimensions: Product, Ship Date, Order Date, and Due Date. This is because there are three joins from the fact table back to the time dimension table. This is called a role-playing dimension and is discussed in the article "Data Warehousing: Dimensional Role-Playing," December 2007.

Now the cube can be processed. All the data from the fact table is read and loaded into the cube, ready for analysis.

Examining the Problem

This simple cube contains only Product and Time dimensions along with a single measure from the FactInternetSales table: Sales Amount. Figure 1 shows a view of the Sales Amount measure and the Calendar Quarter attribute from the Order Date dimension. Note that the Calendar Quarter attribute isn’t being pulled from the Calendar hierarchy but is simply the standalone attribute. Normally, attributes added to a hierarchy are then hidden as standalone attributes so as not to confuse users.

The query returns proper results, but notice that there are only four quarters shown. Each quarter is the total for that quarter for all years; normally, users want to see quarters broken down by year. By placing the Calendar Year attribute on the grid before the Calendar Quarter, the dimension will work as expected. However, there are two problems: First, users must know which attributes to drag over and in which order. Second, aggregations and indexes aren’t being built to facilitate the normal analysis that will be done, which is drilling down the Year to Quarter to Month to Day path.

Recall that the hierarchy, now called Calendar, had a warning indicator next to it. This warning indicates that this is not a natural hierarchy, and the term “natural hierarchies” will be explained in a moment. Therefore, queries at any level of the hierarchy other than the lowest level are suboptimal.

In a relational sense, here is what’s happening: All attributes are related to records in the fact table through the key field of the dimension. This makes perfect sense, since that’s the way a relational join is typically made. If the user asks for yearly totals for the Sales Amount measure, the cube must read all of the daily records; it doesn’t know how the attributes relate to each other. Therefore, without attribute relationships, all queries go to the lowest level of granularity in the dimension to get resolved. In large cubes, this could mean summing up billions of values.

Attribute relationships, on the other hand, tell the cube how attributes are related. If the cube engine understands that the Year attribute can be derived from the Quarter attribute, it can potentially read a much smaller number of records to satisfy the query. By knowing the relationships between the attributes, the cube engine is able to create aggregations at various levels of the hierarchy to respond to queries much more quickly.

Creating Attribute Relationships

Creating attribute relationships means placing attributes in the proper order, but one challenge is to ensure that each attribute in the hierarchy is related only to the next level and not more than one level. For example, the Year and Quarter attributes should be related, while the Quarter and Month attributes should also be related. The Month attribute shouldn’t be related to the Year attribute; even though months do eventually roll up to years, they do so through the Quarter attribute. SSAS understands that months roll up to years but adding an extra relationship that ties months directly to years while months are also tied to quarters and quarters are tied to years can lead to problems.

Figures 2 and 3 show the dimensions after the attribute relationships are created in this simple demonstration cube. Figure 2 shows the attribute relationships in SSAS 2005 while Figure 3 shows the attribute relationships in SSAS 2008. SSAS 2008 adds a graphical designer for creating attribute relationships, which makes it easier to see the relationships and spot problems. Note that after the attribute relationships are created, the warning indicator disappears from the Calendar hierarchy in Figure 2; it is also gone from the hierarchy in SSAS 2008.

Figure 2: Attribute relationships in SSAS 2005

Attribute relationships can be of two types: flexible or rigid. If the relationship shouldn’t change, the attribute relationship should be set to rigid. For example, a City is located in a State/Province and isn’t expected to move; Chicagoans will likely not wake up one morning to find that their city has traveled up the road to Wisconsin. Therefore, the relationship between City and State/Province should be rigid. On the other hand, companies frequently realign sales territories, so the relationship between Sales Territory and Sales Area should be flexible.

Setting Key Values

In some cases, creating the attribute relationships is all that is required. In other cases, such as with the Time dimension, there is another issue. Before any attribute relationships were created, browsing with the Calendar hierarchy worked correctly. In the case of the Calendar hierarchy, there are four quarters in each year and three months in each quarter. This means, for example, that 2002 has a quarter 1 and 2003 also has a quarter 1. When creating attribute relationships, quarters roll up into years. However, simply creating the attribute relationships means that the query for the Quarter attribute, which uses the equivalent of a SELECT DISTINCT, will return only a single Quarter 1. The year to which this single Quarter 1 belongs is unpredictable, but it will be either 2002 or 2003, but not both.

Figure 4 shows this in action. The year 2001 has only a fourth quarter; previously, there were two quarters there, as the data only contained the third and fourth quarters of 2001. On the other hand, the year 2002 used to have all four quarters and now it contains only one quarter. Note that in this case, the cube will only process if the attribute relationships are set to flexible; if they are rigid, the cube processing will fail because one of the quarter values will be found under more than one parent.

To ensure that the quarters are properly tied to the years, the KeyColumns property of the Quarter attribute must be changed. By default, the key member is simply the Quarter column. Cube designers will have to add the Year column as well, and order is important: The Year column must be moved before the Quarter column because the Year column identifies that quarters are part of years and not vice versa. In other words, to uniquely identify a quarter means that you need Year and Quarter, not just the quarter by itself.

While it's important that the Month attribute be tied to the Quarter attribute and not the Year attribute, it is equally important that the Month KeyColumns property be changed so that we don’t have the same problem as with quarters; namely, we don’t want each month to appear in only one of the years. However, the proper setting of the KeyColumns for the Month attribute is to add the Year column, not the Quarter column. If this seems counterintuitive, it’s important to understand that tying January to Quarter 1 doesn't make it January unique; instead, tying January to a specific year is what is necessary. This is one of the primary areas of confusion for cube designers when they create attribute relationships.

Given what you’ve just done, it’s important to understand the overall process. First, create the dimension. Next, create the attribute relationships where they make sense. After creating the attribute relationships, examine the dimension to see if any key values for attributes need to be modified to handle situations where a child with the same name may exist below multiple parents.

Begin With Dimensions, Move to Attribute Relationships

Dimensions are the cornerstones of a good data warehouse. Building relational dimension tables isn't technically necessary but I highly recommend it because having relational dimension tables in a star or snowflake schema usually requires a level of consolidation and data cleansing that ensures data quality. In addition, having relational dimension tables makes it much easier to properly build dimensions in SSAS.

If you take the time to build proper dimensions, add useful hierarchies that support user’s navigational needs, then perform the important step of adding attribute relationships, you’ll realize several benefits. SSAS creates proper indexes and aggregations, speeding up the analysis process by allowing much faster queries. SSAS also processes faster by creating aggregations more quickly and physically storing the data more efficiently. Therefore, you win as a developer of the cube and your users win through faster, more intuitive analysis of the data.

Comments

Plain text