Data Mining on a Shoestring

Data mining is the process of using automated methods to uncover meaning—in the form of trends, patterns, and relationships—from accumulated electronic data.

One of the most common data-mining tasks is targeting the most likely new customer prospects. Another common task is market segmentation. A market segment is a group of customers who share certain characteristics such as length of time as a customer, average expenditures, typical purchases, and so on. Discovering the similarities that characterize and distinguish segments is a computationally intensive operation that's ideally suited to data mining.

When you decide you want to use data mining, you need look no further than SQL Server. The decision tree and clustering data-mining algorithms that Microsoft provides in SQL Server 2000 Analysis Services are well suited to tackling the tasks of acquiring new customers and segmenting your markets. (For more information about using Microsoft data-mining algorithms, see the series of articles by Jim Yang, ZhaoHui Tang, Sanjay Soni, et al, "Model Performance," December 2001, and "Put Data Mining to Work," November 2001.) Let's walk through an example of how you can use the analysis facilities in Analysis Services to perform these tasks, then examine the useful information you can glean from the analysis.

Problem Scenario

The example scenario involves the imaginary enterprise Conference Corp, which provides industry-leading exposure to new IT trends and technologies through conferences, workshops, and seminars. The exclusive, "invitation-only" nature of the events requires the development of high-quality, targeted promotional materials such as personalized offers and conference brochures. Targeting is especially important because the quality materials are expensive to produce and mail. The company strives for high response and attendance rates, continually analyzing the effectiveness of its promotional campaigns.

The company has a database that's organized around a customer entity, the Customers table. This table and two others—Promotions and Conferences—define the Customer object. Figure 1 shows the tables and their constituent fields.

Customers receive many promotions for many events, and after they receive a promotion, they either ignore it or register and attend the promoted event. Say your job is to look at promotional "hits and misses" and determine what characteristics predispose customers to attend the promoted event. Once you know these characteristics, you can better target subsequent event promotions. Focused targeting lowers promotional costs and lets you better serve your customers by giving them information that's appropriate to their interests. This personalization is central to building customer loyalty over time.

To determine the characteristics of customers and prospects who are most likely to respond to your promotional offer, you have to first assemble an analysis data set that contains responses and non-responses to one or more offers. Further, you have to assemble a data set that contains enough distinguishing information to help you evaluate the likelihood of a response based on certain characteristics.

Your observation of the success or failure of previous Conference Corp marketing campaigns suggests that the probability of response is a function of the customer's job type, size of firm, and the firm's annual sales. You also suspect that response rates vary according to customer-relationship characteristics such as how long the person has been a customer, attendance at previous events, and so on. Also, Conference Corp business managers have observed that response seems to be related to the offer type, discount, and coupon policies as well as the number of promotions previously sent to the targeted prospect.

Sorting through all these potential predictors to find the unique combination of attributes that best describes the customer who's most likely to respond is difficult without some sort of automated pattern-search algorithm. As I demonstrate later, data-mining decision trees are particularly well suited to carrying out this kind of automated pattern search.

Unit of Analysis

In any data-mining task, one of the first decisions you need to make is which unit of analysis to use. Just as you can measure and analyze distance in many ways—from inches to miles, for example—you can measure and analyze customer behavior in many ways. A review of the Customer object's structure shows at least four potential units of analysis:

The customer's company — 1955 cases (records) in the data set
The individual customer — 3984 cases in the data set
The response — 9934 responses in the data set: 8075 for the e-commerce conference (55 percent), 1467 for the Java conference (10 percent), and 392 for the Windows CE conference (3 percent), for a total response rate of about 68 percent
The promotion — 14,589 cases in the data set

Using the customer as the unit of analysis might be tempting because you're studying customer behavior. However, for this example, the unit of analysis has to be the promotion because you want to look at every promotion and examine whether it produced a response from any customer. This answer will give you a response rate from which you can derive a profile of high-responding customers. So, promotion is the unit of analysis, and response is the outcome that you want to examine and explain. In this case, to explain the outcome, you need to construct a predictive model that uses the characteristics reported on the analysis record to discriminate between responding and non-responding customers.

To develop a model of how effective various promotions are and the associated customer attributes that predict promotional response, you need to express the data in the appropriate format. A positive response—in this case, attendance at a conference—is determined by a combination of customer and promotional characteristics. You can express this relationship as an equation where the result (listed first) is a function of inputs:

Attendance (Outcome / Target) = 
Customer characteristics + Promotional activities

You'll have to do some data mining to find the values for customer characteristics and promotional activities.

Remember that the unit of analysis is the promotion. Copies of each customer record for each promotion are throughout the analysis file; when a customer record has more than one promotion associated with it, the customer record characteristics are copied over for each promotion.

The Mining Model Wizard (which I explore later) uses the following SQL expression that joins three tables—Customers, Promotions, and Conferences—to produce the analysis view that creates copies of customer records for each promotion:

SELECT Customers.*, Promotions.*, 
\[Conferences\].\[CourseCode\], \[Conferences\].\[DATE\]

FROM Customers RIGHT JOIN (Promotions LEFT JOIN Conferences ON
(\[Promotions\].\[CustNum\]=\[Conferences\].\[CustNum\]) AND
(\[Promotions\].\[PromoDate\]=\[Conferences\].\[DATE\])) ON
\[Customers\].\[CustNum\]=\[Promotions\].\[CustNum\];

The right join matches customers with all promotions in the file. Because more than one attendance might result from a given promotion, the left join that's in parentheses next to the RIGHT JOIN expression gathers all attendances. The net result is a file of 14,695 promotional offers that resulted in 9934 attendances.

Creating the Mining Model

The previous query creates a Microsoft Access database. To analyze a relational data source (including Access), you need to first publish it as a data source in the Windows 2000 or Windows NT environment by establishing a data source name (DSN). For details about how to set up Access as a data source, see the sidebar "Using Access as a Data Source."

Now you're ready to create a mining model to process and analyze the data. Let's first use the Microsoft Decision Trees algorithm to develop a predictive model that will find customers who are likely to respond to a promotional offer. You begin defining a mining model in the analytic data mart by starting Analysis Manager (under the Start menu, select Programs, Microsoft SQL Server, Analysis Services, Analysis Manager). In Analysis Manager, go to the tree view and expand Analysis Services. Double-click the name of your server to establish a connection with the Analysis Server. To set up a new database, right-click your server name, then click New Database. In the Database dialog box, enter ConfCorp (or whatever you named your data source) in the Database Name box and click OK.

By default, the new database (in this case, ConfCorp) that this operation creates contains five nodes: Data Sources, Cubes, Shared Dimensions, Mining Models, and Database Roles. These nodes appear in a Windows Explorer—like hierarchy. The hierarchy shows that—up to this point in the process—the operations involved in creating a mining model are the same as those involved in creating a dimensional view of the data by using standard SQL Server OLAP model operations. This symmetry between creating an OLAP-type dimensional model and a data-mining type model is maintained throughout Analysis Services and is one of the major contributions of SQL Server 2000 to the integration of these two approaches to business intelligence (BI).

To begin data mining, right-click on the Mining Models folder, then select New Mining Model from the shortcut menu to display the Mining Model Wizard. Now click Next in the Select source type window, select Relational Data, and click Next again. In the Select case tables window, select the ConfCorp database (or whatever DSN you established) and click Next.

After you select the database, the associated tables and views become available, as Figure 2 shows. Let's select the JavaResults view, which shows the customers who responded to the Java conference attendance offer. Next, the wizard asks you to select the data-mining technique to use. In the Select Data-Mining Technique window, select Microsoft Decision Trees. Click Next.

Now you need to identify the case base (or unit of analysis) for the modeling task. The JavaResults view has a field called ID that represents the customer ID and ensures that the case base doesn't contain redundant customer records. The Mining Model Wizard prompts you for this name in the Select key column step. Select ID for the case key column and click Next.

The final step in setting up the analysis task is to select the outcome, or target, of the analysis. In the JavaResults view, I created the Outcome field for this purpose. In the wizard, select Outcome under Predictable columns. The wizard also presents the inputs, or predictors, that you can select in the model. You can select such fields as Tenure, Gender, Size of Firm, and PromoDate as inputs. These fields give you indicators of all the dimensions of analysis for this model: personal and company characteristics, customer-behavior characteristics, and promotion characteristics.

Save the model as PromoResults. If you select the Save and Process Now option, the model begins to work with the data to find a good predictive model based on a decision tree. This option triggers the execution of the Analysis Manager data-mining algorithms.

The model execution view shows the progress of the data-mining algorithm and notifies you that the algorithm has found the best model by posting a Finish message to the window. Selecting Done displays the results in the screen that Figure 3 shows. The decision-tree display has three main areas: the main display, which gives a detailed view of the tree; a Content Navigator, which shows a high-level, summarized view; and an Attributes results browser, which provides a statistical summary of the current highlighted node.

The Tree Navigator

You can browse the Content Navigator to examine the results of the data-mining algorithm. The Content Navigator lets you browse the entire tree (which can be very large) to identify the area that you want to view in more detail. As Figure 3 shows, the tree's nodes are represented as boxes. The colors in the boxes indicate the density of the outcome—a denser color means a greater proportion of responses.

The Attributes results browser lets you highlight any tree node to produce a statistical summary of the results in the node. In this example, the response—attendance at a Java conference—accounts for 10 percent of the cases in the analysis. The Attributes window shows that, of the 14,589 cases selected for this analysis, a total of 1467 replied to the promotional offer. Therefore, the conference attendance rate was approximately 10 percent (1467 of 14,589).

The conference organizers want to use the decision-tree method to discover how this attendance varies according to certain customer characteristics. They're also interested in developing a decision rule that would help distinguish low-probability attendees from high-probability attendees. By targeting only the high-probability customers, the company can then use these decision rules to increase the probability of success for future conference offerings.

At the highest level of Figure 3's decision tree, you can see that the algorithm has identified tenure (i.e., length of time as a customer) as the most important distinguishing characteristic that indicates likely response to a conference offer. (Importance is a statistical measure of strength that the data-mining algorithms establish.) In the Attributes pane, you can see that, if tenure is less than or equal to .125 years (approximately 45 days), the probability of response climbs from the average of 10 percent to 23.51 percent. So the highest probability of a response to a promotion comes from customers who have been recruited very recently—probably for the specific offer of this campaign.

The Attributes results for the lower node (Tenure > 0.125), which Figure 4 shows, displays the results for the higher-tenure cases in the database (i.e., those who have been customers for more than 45 days). Among high-tenure customers, the probability of response drops to about 2 percent. You're also beginning to run out of data with this low-response group: only 201 cases are available—a very low number. As the number of cases in a group begins to drop, the statistical accuracy of results tends to decrease.

You can use the Microsoft decision-tree algorithm to look more deeply at the data. One aspect of the decision-tree data-mining algorithm is that you can analyze each node on the tree in turn. For example, Figure 5 shows a deeper display of the high-tenure (> 0.125) customers. The display shows that if tenure is high and the customer is female, the probability of response drops even further. Among the 20 high-tenure females in the database, the average attendance rate is 1.18 percent. Note also the low-density data node for cases that were missing the Gender field. In this node, the attendance rate is zero, suggesting that poor data quality reflects poor customer loyalty in this part of the customer data.

Among the 181 males at the tree level that Figure 5 shows, the attendance rate is 2.46 percent—so men are roughly twice as likely to attend a promoted event as females. Figure 6 shows that males who have an exceptionally long tenure (>= 2.125 years) and who come from relatively small firms (<= 0.25 on a scale of 0 to 10)—or on the other extreme, from relatively large firms (> 1.75)—are more likely to attend: 7.6 percent. This statistic places this group at about the same level as the average attendance rate of 10 percent and indicates that these people can be targeted with the aim of increasing loyalty and lifetime value.

Women who come from firms with relatively low annual sales and from midsize firms (> 1.75 and <= 3.25) are also good targets. This group had an attendance rate of 14.85 percent. However, this node has only 14 "positive" occurrences of attendance. This is a relatively small number to base results on, but the results are still statistically valid from a theoretical point of view. However, you'll probably want to verify these results against a holdout sample or validation database to see whether these results can realistically generalize a new marketing target population.

Looking over the tree results, you can see that the best predictor of response—length of time as a customer—although seemingly useful, reveals a problem. Short-term customers are most likely to respond to campaigns, so it appears that customer loyalty in the existing customer base is low. However, data mining can be useful in exploring and cultivating the most loyal customers in order to increase their lifetime value. The most loyal customers turn out to be males who come from either very small firms or very large firms. While the overall long-term response rate is very low (1 percent), males in the small- or large-firm category respond at a rate of 8 percent. So Conference Corp can increase response rate and stimulate customer loyalty by targeting more people in this market segment.

The data-mining model also reveals the keys to increasing response rate among females. Females in midsize firms (neither small nor large) with moderate income are the best targets. Among these customers, the response rate increases to approximately 15 percent.

Using Clustering to Create Segments

Another data-mining algorithm, Cluster Analysis, lets you segment the target population reflected in the database based on shared similarities among several attributes. With clusters, unlike with decision trees, you don't need to specify a particular outcome for the algorithm to use in determining various classes, discriminators, and predictors. Instead, you specify which fields you want the clustering algorithm to use when it assesses the similarity or dissimilarity of the cases that it considers for assignment to the various clusters.

To begin the data-mining modeling task for cluster analysis, you need to specify the source data. As with the decision tree, you can point the Mining Model Wizard at the Conferences.mdb data source and use the Customers table as the analysis target. In this case, the algorithm will focus on customers, evaluating their shared similarities according to various attributes to determine which cluster they belong to.

After you select the target data table, the wizard asks you to specify the data-mining technique. The process is the same as with the decision tree, but in this case, select Clustering as the data-mining method.

As in all data-mining models, the wizard asks you to specify the unit of analysis by selecting the case key for the analysis. As with the decision tree, the case base, or unit of analysis, should be the customer level, so specify the customer ID as the key field. As Figure 7 shows, the analysis wizard then asks you to specify the fields to use to form the clusters. The algorithm will use these fields to collectively gauge the similarities and dissimilarities between the cases to form the customer clusters.

After you select the fields, you can run the cluster model. After processing, you get the results that Figure 8 shows. The Content Detail and Content Navigator areas use color to represent the density of the number of observations (customer records in each cluster). You can browse the Attributes results to look at the characteristics of the various clusters. By default, the clustering data-mining algorithm forces the clusters into 10 distinct homogenous groups. However, if you want to understand the characteristics of the groups, you might choose to tune the clustering engine to produce fewer clusters. For this example, let's choose three clusters. To change the number of clusters that the algorithm generates, go to the Properties dialog box in the Mining Model Editor and change the Cluster Count value from 10 to 3. This change instructs Analysis Manager to recalculate the cluster attributes and members by trying to identify 3 clusters rather than the default 10 clusters. To complete this recalculation, you need to reprocess the data-mining model. You can then browse the model to see the new results.

Customer Segments Revealed

Figure 9 shows the results after reprocessing produces a new clustering. The Attributes pane shows the decision rules that you can use to characterize the cluster membership. Each decision rule produces an algorithm that classifies a case into a unique cluster. The cluster that the algorithm finds depends on how the cluster decision rule's preconditions match the specific attributes of the case that the algorithm is classifying. Figure 10 shows decision rules that can be used as a scoring algorithm to classify cases (or records) into the three clusters. Note that the fields that are decision-rule preconditions are the same fields you specified in the Mining Model Wizard for use in calculating similarity.

After they've been classified in the various clusters, these decision rules provide a statistical summary of the cases in the data set. Here, you can see that Cluster 1 characterizes customers from generally small, generally low sales-volume firms. Cluster 1 members also tend to have short tenure. Cluster 2 draws on customers from the larger, high sales-volume firms, whereas Cluster 3 is primarily a female cluster and contains all the very short-tenure members.

These divisions suggest that the example company has small, low sales-volume customers who tend to be male. Female customers are either longer-term customers from larger, higher-sales companies or very short-term customers from small- to medium-sales companies. Also, most short-term customers are female.

Digging for Gold

You can see that the decision-tree technique and the cluster technique produce different kinds of results: The decision tree is based purely on probability of response. The clusters, however, are based on tenure, gender, size of firm, and annual sales. In fact, for clustering, probability of response was specifically excluded.

The Microsoft Decision Trees and Cluster Analysis algorithms give you substantial data-mining functionality that can help you tackle a wide range of data-mining tasks. As I showed in this article, you can use decision trees to calculate, for example, the probability of response to a promotion based on customer characteristics. As a result, your marketers can get better results with their promotional budget by targeting specific groups of customers. Because the decision tree displays group characteristics, marketers can also craft their targeting messages to better reflect the characteristics of the targeted group.

Similarly, you can use clustering to group customers together irrespective of their probable response to a particular outcome measure. This method can generally characterize different groups of customers—or segments—based on shared similarities and tendencies to be different from other groups or segments of customers. This approach can help marketers develop a sense of how a segment of the customer base might behave in response to new product offerings and how to approach different groups or segments with new products or targeted promotions.

Comments

Plain text