Some years ago, I wrote an article titled "SQL by Design: Why You Need Database Normalization." At that time, NoSQL databases were a tiny percentage of the installations, and relational ruled the land.
Today, the landscape has changed, and we have relational database management systems like SQL Server sharing the spotlight with variations of the NoSQL technology. So, I thought we could revisit the original reasons for normalizing data and see if they're still relevant.
Let's start with some basic assumptions. When you normalize a database, you're targeting the following goals:
- arranging data into logical groupings such that each group describes a small part of the whole
- minimizing the amount of duplicate data stored in a database
- organizing the data such that, when you modify it, you make the change in only one place
- building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage
The original article contended that "Data normalization is primarily important in the transactional, or online transactional processing (OLTP), database world, in which data modifications (e.g., inserts, updates, deletes) occur rapidly and randomly throughout the stored data." The article then went on to point out that data warehouses, which often contain volumes of denormalized and precalculated/summarized data, often ignore the rules of data normalization.
Need to Understand the NoSQL Data
To the data warehouse model, we can add many of the current installations of NoSQL databases—databases that are engineered to capture semi-structured data, from data sources over which quality we have little, if any control. Furthermore, the data meaning can actually shift over time. It seems to me that if we want to be able to integrate NoSQL data into our own highly transactional, well-defined, conventionally-normalized financial data, for instance, then yes—we need to understand the NoSQL data, its meaning, where it came from, its quality, its integrity. Well, if we want accurate answers, that is, we need to know.
Are you thinking that the NoSQL installation at your shop is a stand-alone system which is being mined as the data is gathered, and that nothing else will ever be done with it? Good luck with that idea! If IT history has shown us nothing else, it's clear that data is being used and reused in ways that were never imagined when the collection systems were put in place.
Highly-Structured SQL Data is your Stake in the Ground
If you’re working in a hybrid (SQL plus NoSQL) environment, I believe that the highly-structured SQL data is your stake in the ground; it'll give you the basis on which to build additional knowledge and marketing expertise when integrated correctly with the NoSQL data that's being collected in concert with it. And that's why I think you still need to normalize your (SQL) data.
Got other ideas? Let’s hear them!