Business users and analysts make up the majority of our target personas in the business intelligence world. And it's not unique to SQL Server BI professionals that Excel is a preferred tool for analysis, pivoting and charting. We talk a lot about Power Pivot (yes, I remembered to put a space in the name that time!) and Power View as ways to model the source data (Power Pivot) and present it as compelling data visualizations (Power View). Presenting your organization's complex data as a simple set of compelling reports and flows is central to your value in the role of BI professional.
While we're all comfortable working with ODBC, JDBC, SQL, MDX, etc. as sources, things begin to breakdown when NoSQL data stores like MongoDB are thrown into the picture. MongoDB is a JSON document-oriented data store that enables flexible schemas and fast data loading by storing data as documents without requiring upfront stingent data schemas. While that may be a difficult concept to get your head around if you are a long-time OLTP or OLAP data modeler or DBA, just think of how many business users we may have lost already in this conversation!
The bottom-line to this discussion is this: As Big Data and NoSQL become more and more ubiqutous, you will need to start adding those sources into your BI solutions. In this post, I'm going to show you how I solved this problem for a customer recently by getting the data into an Excel spreadsheet, visualized in Power View, modeled in Power Pivot and transformed from JSON to SQL using Pentaho Data Integrator. Truly, a hybrid solution for sure, where we will leverage Microsoft common Office and database tools together with open source data stores (MongoDB) and an OSS data integration tool partner of Mongo Inc's, Pentaho on the back end.
Here is my example of how I solved this for a customer using open source databases such as MongoDB as a source for Power Pivot, extracting and loading data into the Mongo data store using the OSS Pentaho Data Integrator (sometimes called Kettle or just PDI). This is a natural solution because the end-users wanted to stay in Excel and the IT team is loading large amounts of quick-arriving Web log data in open source tools like Mongo & Pentaho. PDI comes loaded with native MongoDB adapters, so it natively knows how to connect to and traverse collections in Mongo data stores and also includes a JSON document parser. This makes the entire process of getting data into & out of MongoDB super easy.
Once you've built out a transformation that looks like my sample below, you can then join your Mongo data with the rest of your organization's data assets just like any other data source. The ability of tools like Pentaho to normalize NoSQL and unstructured data is key to enabling analysis like this, where the customer wanted to use Excel 2013 for end-user analysis.
Here is a good link to describe the proper configuration to connect to a MongoDB server with PDI. Make sure that you wire the MongoDB connection up to a JSON input step to parse the fields that you want to make available to Excel from your data store collection:
Ok, so now we have data in Power Pivot (almost forgot the space!) so that we can mash it up and generate models for Excel Power View and Power Pivot visualizations, including my sample below where I've included a lookup field for images that I can use as slicers:
From here, we are now in our comfortable Power Pivot environment working with data so that we have now enabled Excel-based analytics on MongoDB that can look like this:
The important take-away from this brief story is to use a data integration tool that can take JSON docs from MongoDB like in this example:
... And turn it into a feed that a tool like Excel can consume and display. Because you are working with JSON documents in a data store, your interaction is a bit different than with a traditional RDBMS. I did not go into any details of the Pentaho Data Integrator here because I am only using that as an example of how I solved this problem. There are other MognoDB data integration partners that do similar extraction and parsing as well.