Loading QWI Data from the U.S. Census Bureau into Hadoop

Loading QWI Data from the U.S. Census Bureau into Hadoop

Quarterly Workforce Indicators (QWI) data can be downloaded from the U.S. Census Bureau, as shown in Figure 1.

My example uses files representing a state level summary of private workforce data by employee sex and age, firm size, and industry group. The direct links for Texas, California, and Nebraska are here:

Note that there are additional, smaller files that describe the various age, firm, and industry group categories. These files could also be downloaded and inserted into Hadoop to represent additional tables. In my example, I simply downloaded these additional files directly into an Excel PowerPivot workbook.

Figure 1: QWI Data Download
Figure 1: QWI Data Download

Once you have the three .gz files downloaded, you need to get them into your Hadoop cluster. For HDInsight, you'll want to upload the files to an Azure Blob Container within the storage account associated with the cluster. I used a free tool from codeplex, the Azure Storage Explorer, to upload the files (see Figure 2). In a production environment, you would likely use the Azure Storage APIs and/or Power Shell.

Figure 2: Azure Storage Explorer
Figure 2: Azure Storage Explorer

Using HDP Sandbox

If you are using the HDP Sandbox, you can use the Hadoop command line interface—or you can upload files by using Hue—an included Web interface for Hadoop (Note: Hue is not available for a HDP installation on Windows). Figure 3 shows Hue, accessed from my host machine's browser (the sandbox is running as a guest VM).

Figure 3: Hue
Figure 3: Hue

Installed HDP on Windows OS

If you've chosen to install HDP on a Windows operating system, you can use the Hadoop command line to load files into a folder. Figure 4 shows the steps needed to create a folder and then upload the three files.

Figure 4: Hadoop Command Line
Figure 4: Hadoop Command Line

Main article: Integrating Hadoop with SQL Server

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish