Skip navigation
word cloud with BUSINESS INTELLIGENCE in center

Introducing Microsoft Power Query

How to gather, refine, and combine data

Microsoft Power Query is a new self-service business intelligence (BI) tool that helps end users find and prepare data for analysis. BI is a multi-step process (gather, store, model, analyze, and share data) that turns raw data into useful information. Power Query is aimed squarely at the "gather" step of this BI process and makes it possible for queries to be published and re-used. Even though Excel (and Power Pivot) can already connect to, and gather data from, a variety of sources, Power Query's data transformation capabilities go beyond Excel and Data Analysis Expressions (DAX) functions. The result is a script that's easier to maintain than complex Excel formulas and macros.

Available as an add-in for both Microsoft Excel 2013 and Excel 2010, Power Query makes it easy to discover, combine, and refine data from both public and private data sources. I'll walk you through an example of how to use Power Query to load data from the Internet. If you want to follow along and you don't have Power Query, you can download it from the Power BI for Office 365 website. Note that all of my examples are built with Excel 2013.

Getting the Data

Assuming that you have the Power Query add-in installed, start Excel and click the Power Query ribbon. You'll see several options available to get external data, including Online Search, which is located on the far left side of the ribbon. This option lets you perform a free-form search for data, just as you would use an Internet search engine. Note that this online search doesn't crawl the entire Internet; rather, it's limited to a specific set of sites, such as Wikipedia and some U.S. government sites. You can also publish and search for your own data sets when Power Query is used in conjunction with a new cloud-based offering called Microsoft Power BI for Office 365.

For this example, you'll be connecting to a public Excel file that contains data about monthly vehicle sales in the United States. Before you begin, select Options from the ribbon's Machine Settings group and select the Enable Advanced Query Editing option. Then, in the ribbon's Get External Data group, select the From Web option. In the From Web dialog box shown in Figure 1, enter http://www.bea.gov/national/xls/gap_hist.xls in the URL text box and click OK.

Figure 1: Entering the URL to Get the Excel File
Figure 1: Entering the URL to Get the Excel File

Generating a Query

At this point, Power Query has already figured out that the URL you entered points to an Excel workbook. So, when it opens the Query Editor, you'll see all the workbook's worksheets and custom ranges listed in the Navigator pane on the left.

In the Query Editor, expand the Steps pane on the right, then scroll down and select the Table 5 worksheet in the Navigator pane, as Figure 2 shows. Table 5 contains data about heavy truck sales. In the top left corner of the Query Editor, notice that the query is named Query1. Right-click the query name, select Rename, and enter the new name of Vehicle Sales.

Figure 2: Selecting the Table 5 Worksheet
Figure 2: Selecting the Table 5 Worksheet

Next, right-click the Table icon in the top left corner of the Preview pane (the center pane) and select the Remove Top Rows option. In the Remove Top Rows dialog box, which is shown in Figure 3, enter a value of 3 and click OK.

Figure 3: Removing the Top Three Rows
Figure 3: Removing the Top Three Rows

Right-click the Table icon in the Preview pane again and select the Use First Row As Headers option. Notice that the first two columns are still named Column1 and Column2. These headers didn't change because there were null values in the first row of those two columns. Double-click Column1 and rename it MonthName. Double-click Column2 and rename it Year.

Scroll to the far right of the Preview pane and highlight all of the extraneous columns (Column7, Column8, Column9, and Column10) by clicking the column names while holding down the Shift key. Right-click one of the highlighted columns and select the Remove Columns option.

Scroll back to the MonthName column. Click the down arrow next to the header and clear the (null) check box.

You still have a few additional steps to complete the Vehicle Sales query, but let's pause here for a moment. If you've followed along, your Query Editor should look like the one shown in Figure 4. (Note that I've hidden the Navigator pane.) As the Steps pane shows, six steps have been generated for this query so far. Click the first step (Source). Notice that the Preview pane reverts back to the original state of the Table 5 worksheet. If you select the next step (RemovedFirstRows), you'll see the Preview pane update accordingly. The Query Editor behaves like a flight recorder and even lets you "replay" each step against the data source.

Figure 4: Replaying the Steps Generated for the Query
Figure 4: Replaying the Steps Generated for the Query

Each step in the query is actually part of a script. If you click the Script icon located in the top right corner of the Preview pane, the Edit Query window will appear, displaying the raw script for the query. Listing 1 shows this script. (If you don't see the Script icon, make sure that you selected the Enable Advanced Query Editing option as described earlier.) You can edit the script directly in this window, but if you plan to make any significant changes, I recommend that you first save a copy the original script. Close the Edit Query window by clicking Cancel.

let
  Source = Excel.Workbook(Web.Contents
  ("http://www.bea.gov/national/xls/gap_hist.xls")),
  #"Table 5" = Source{[Name="Table 5"]}[Data],
  RemovedFirstRows = Table.Skip(#"Table 5",3),
  FirstRowAsHeader = Table.PromoteHeaders(RemovedFirstRows),
  RenamedColumns = Table.RenameColumns(FirstRowAsHeader,
  {{"Column1", "MonthName"}, {"Column2", "Year"}}),
  RemovedColumns = Table.RemoveColumns(RenamedColumns,
  {"Column7", "Column8", "Column9", "Column10"}),
  FilteredRows = Table.SelectRows(RemovedColumns, each
  ([MonthName] <> null))
in
  FilteredRows

As I mentioned previously, the query isn't quite done yet. You still need to add two new columns and make some other changes, so let's resume generating the query.

Right-click anywhere in the Preview pane and select the Insert Column-Custom option to bring up the Insert Custom Column dialog box shown in Figure 5. 

Figure 5: Inserting the Custom Column Formula
Figure 5: Inserting the Custom Column Formula

In the Custom Column Formula text box, enter the following formula:

=Text.Replace(Text.Replace(Text.Replace(Text.Replace
  (Text.Replace(Text.Replace(Text.Replace(Text.Replace
  (Text.Replace(Text.Replace(Text.Replace(Text.Replace
  (Text.Start([MonthName], 3), "Jan", "1"), "Feb", "2"),
  "Mar", "3"), "Apr", "4"), "May", "5"), "Jun", "6"),
  "Jul", "7"), "Aug", "8"), "Sep", "9"), "Oct", "10"),
  "Nov", "11"), "Dec", "12")

Power Query introduces a new formula language for data transformations. Currently, there isn't a built-in formula for converting the month name to a corresponding number, so this particular formula does the trick. (You can learn more about the available functions by visiting the Learn about Power Query formulas web page.)

After clicking OK to close the Insert Custom Column dialog box, you'll see a new column named Column1 in the Preview pane. (You might need to scroll to the right to see it.) You need to rename this column to MonthNumber. As you've already seen, one way to rename a column is to double-click the column header and enter the new name. There's also another method to rename a column that doesn't result in an additional step. As shown in Figure 6, you can highlight the last step in the Steps pane and change the column name in the function text box. Note that I've also renamed this step to InsertMonthNumber to make the query easier to understand.

Figure 6: Changing a Column Name Without Adding a Step
Figure 6: Changing a Column Name Without Adding a Step

Next, right-click anywhere in the Preview pane and select the Insert Column-Custom option again. In the Custom Column Formula text box, enter the formula:

=Date.EndOfMonth(Date.From([MonthNumber] & "/01/"
  & Text.From([Year]) ) )

This formula constructs a date value from the year and month number, then returns the last day of the month. Change the name of this column to Date. In addition, change its data type to Date by right-clicking the column header and selecting the Change Type option.

You also need to change the four sales columns' data type to Number. Those columns are named:

  • Not seasonally adjusted (Thousands)
  • Combined seasonal, trading-day factor
  • Seasonally adjusted (Thousands)
  • Seasonally adjusted at annual rates (Millions)

Next, you can remove the Year, Month, and MonthNumber columns, because they're no longer needed. As previously discussed, this can be done by right-clicking a column header and selecting the Remove Columns option.

After you've removed those three columns, you need to move the Date column to the far left of the Preview pane. This can be done by dragging and dropping the column header or by right-clicking the column header and selecting the Move option.

Close the Query Editor by clicking the Done button. Figure 7 shows an excerpt of the results returned by the Vehicle Sales query. (You can see the complete result set in the sample workbook PowerQuery_Chessman_SQLPro_Example1.xlsx, which you can download by clicking the Download the Code button near the top of the page.)

Figure 7: Examining the Results of the Vehicle Sales Query
Figure 7: Examining the Results of the Vehicle Sales Query

By default, query results are inserted into an Excel table with the same name as the query. (Note that underscores are used in place of any spaces in the query name.) In Excel 2013, there's also the Load to data model option, which loads the query results to Excel's data model. The data model in Excel 2013 is effectively the Power Pivot engine first introduced in Excel 2010. (For more information about the data model, see my article "Understanding PowerPivot and Power View in Microsoft Excel 2013.") Note that for query results exceeding 1,048,575 rows, loading to the data model (and turning off the Load to worksheet option) is required.

Refreshing a Query

At some point, you'll likely need to update a query's results. There are three ways to refresh a query. The first option is to click the Refresh link in the Query Settings pane, as shown in Figure 7. Another approach is to use the Refresh All option in the Connections section of the Data ribbon. Note that you can automate this approach with an Excel macro. (For an example, see "Understanding PowerPivot and Power View in Microsoft Excel 2013.") If you load the query to Excel's data model, you also have the option to use the Refresh button from within the Power Pivot window.

Appending Data

Another Power Query capability that you'll likely need to use is appending data. For example, in the public Excel file that contains data about monthly vehicle sales, the sales numbers are spread across different worksheets—the sales for domestic autos are in the Table 1 worksheet, the sales for foreign autos are in the Table 2 worksheet, and so on. To get a complete and detailed breakdown of sales by vehicle type, you need to append the data in the worksheets into a single table—or in Power Query lingo, into a single query. Power Query's Append option, which is in the Combine group of the Power Query ribbon, lets you achieve this task. It creates a new query that contains all rows from the active query (i.e., the query in which you select the Append option) followed by the rows from the queries that you want to append. You can append up to two queries at a time using the Append option. To append more than two queries, you need to manually edit the Power Query script.

For more information about appending data, see the Combine multiple queries web page. If you want to see an example of appended queries, you can download the PowerQuery_Chessman_SQLPro_Example2_VehicleSalesAll.xlsx workbook by clicking the Download the Code button. In this workbook, there's an example of two queries appended together using the Append option, as well as a query that leverages a script I edited to append five queries.

Unpivoting Data

Using pivot tables is a great way to look at data (hence their popularity), but tools such as Power Pivot are expecting a straight tabular format. Fortunately, Power Query lets you easily unpivot data. For example, Figure 8 shows housing permit data that has the years along the rows of the table and the months across the columns.

Figure 8: Unpivoting a Data Set
Figure 8: Unpivoting a Data Set

By highlighting all the Month columns, right-clicking one of the highlighted columns, and selecting the Unpivot Columns option, you can quickly get this data set into a tabular format, as shown in Figure 9. (The PowerQuery_Chessman_SQLPro_Example3_UnpivotingData.xlsx workbook contains the entire data set in case you want to see it.)

Figure 9: Examining the Data Set When It's in Tabular Format
Figure 9: Examining the Data Set When It's in Tabular Format

Publishing a Query

Microsoft recently announced Power BI for Office 365, which is a cloud-based offering that lets an organization publish and share self-service BI artifacts such as Excel workbooks. With Power Query, you can publish a query to Power BI so that other users can search for and re-use these queries in their own Excel workbooks. Power BI was still in preview at the time of this writing, but part of Power BI includes a gateway so that published queries can include not only external data queries but also internal data sources (e.g., SQL Server). For more information about Power BI, see the Power BI for Office 365 website.

Get Familiar with Power Query

Finding, transforming, and cleaning up data is typically a tedious task at best. Furthermore, the process of gathering data often consumes a significant amount of time. With Power Query, end users can easily gather, refine, and combine data from both public and private data sources. I encourage you to get familiar with Power Query and introduce it to the end users in your organization.

TAGS: SQL
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish