Quarterly Workforce Indicators (QWI) data can be downloaded from the U.S. Census Bureau, as shown in Figure 1.
My example uses files representing a state level summary of private workforce data by employee sex and age, firm size, and industry group. The direct links for Texas, California, and Nebraska are here:
Note that there are additional, smaller files that describe the various age, firm, and industry group categories. These files could also be downloaded and inserted into Hadoop to represent additional tables. In my example, I simply downloaded these additional files directly into an Excel PowerPivot workbook.
Once you have the three .gz files downloaded, you need to get them into your Hadoop cluster. For HDInsight, you'll want to upload the files to an Azure Blob Container within the storage account associated with the cluster. I used a free tool from codeplex, the Azure Storage Explorer, to upload the files (see Figure 2). In a production environment, you would likely use the Azure Storage APIs and/or Power Shell.
Using HDP Sandbox
If you are using the HDP Sandbox, you can use the Hadoop command line interface—or you can upload files by using Hue—an included Web interface for Hadoop (Note: Hue is not available for a HDP installation on Windows). Figure 3 shows Hue, accessed from my host machine's browser (the sandbox is running as a guest VM).
Installed HDP on Windows OS
If you've chosen to install HDP on a Windows operating system, you can use the Hadoop command line to load files into a folder. Figure 4 shows the steps needed to create a folder and then upload the three files.
Main article: Integrating Hadoop with SQL Server