In April, I wrote a column called “Could Unstructured Data Management Technology Replace the Relational Database Someday?” I’ve included my opening and closing remarks as a quick reminder and level set for this month’s commentary.
I opened “Could Unstructured Data Management Technology Replace the Relational Database Someday?” with “I’m going to say something that will make many of you mad and wonder if I’m a complete idiot—maybe someday we won’t need relational databases. Let me be clear about a few things before going further. First, I don’t think we’re anywhere close to saying “Thanks Mr. Relational Database, but we don’t need you anymore.” Second, I’ve known the basic rules of normalization longer than some of you have been alive. Trust me, I get databases. But maybe, just maybe, technology is reaching the point in which some data management paradigms that used to be shoehorned into a relational database no longer need to be. Here’s what got me thinking these crazy and heretical thoughts.”
I closed the commentary with “I don’t think bit buckets will ever replace “real” databases. Locking, concurrency, high availability, one version of the truth, and a few semesters worth of other database topics make me believe we will always need databases. Will databases always be relational? Does it matter? I suspect many problems in the database world will always be best described using relational math and set theory. However, many database pros are at risk of having their heads in the sand when it comes to some of the newer trends involving unstructured data, including, but not limited to, SharePoint. It’s time to pay attention. Your users and customers are.”
I’m embarrassed to say that I was talking about the NoSQL movement without even knowing that the term “NoSQL” had been coined. I decided to revisit this topic because I’ve heard and learned more about the NoSQL movement. You’ll find a large amount of content that explores NoSQL. The following two articles do a nice job of summing up the main issues with respect to NoSQL: “No to SQL? Anti-database movement gains steam” and “NoSQL – the new wave against RDBMS."
NoSQL posits say that traditional relational database management systems (RDBMSs) can’t scale and are too expensive to meet the enormous performance needs of modern web-based architectures for a Web 2.0 world. Instead, “web scale” applications using NoSQL have custom, proprietary data sources or perhaps a NoSQL database engine (and I use the term “database engine” lightly) such as MongoDB. Effectively, all of these NoSQL approaches avoid joins and avoid writing to disk as much as possible. Some would argue that they aren’t even databases in the traditional sense of the word and are simply highly distributed key/value stores mostly doing their processing in memory.
Unfortunately, many of the NoSQL-related articles I’ve read recently are consistently beating the drum that traditional RBDMS engines are fundamentally non-scalable and that NoSQL is ordained as the wave of the future web processing. I disagree. The NoSQL and RDBMS camps each have compelling pros and cons and each can meet a wide variety of business needs. Neither side is 100 percent right or wrong as is true in most techno-religious debates.
NoSQL got its start with the likes of Amazon and Google. Traditional RDBMS solutions at the time didn’t meet these companies’ performance needs or would have been prohibitively expensive to implement based on licensing costs even if it was possible to build out. This led to the growth of several open-source NoSQL approaches that do indeed offer very impressive performance numbers. However, don’t expect the same breadth of features and certain basics such as guaranteed atomic transactions. But you know what, sometimes that’s OK, as several case studies referenced in the articles I previously mentioned point out.
My primary beef with the NoSQL camp is the position that RBDMSs can’t scale. That simply isn’t true for 99.99 percent or more of the data processing world. It seems like folks who suggest NoSQL or nothing are in danger of throwing out the baby with the bath water by suggesting that NoSQL is always the answer and that it’s a foregone conclusion that NoSQL will eventually stamp out the relational data model.
Are companies such as Google, Amazon, and Adobe dumb? Their market caps suggest quite the opposite and they all use high-profile NoSQL solutions for different needs. Clearly there are cases in which NoSQL solves business problems. Aside from the few cases in which petabytes of data simply won’t flow efficiently through a traditional RBDMS, I think the more compelling cases for NoSQL are cost and ease of use. Free is pretty compelling compared to potentially millions of dollars in licensing fees paid to Microsoft, Oracle, or other mainstream RBDMS providers as long as the solution meets all of your business needs. On the ease of use side, NoSQL advocates will point to the long-held position that there tends to be an impedance mismatch between the set-based approaches of an RDBMS and the procedural style that computer programs are actually written in. I suspect that each of these dynamics will force traditional RDBMS providers to be more innovative in their offerings and price points, which is ultimately good for everyone.
Long live relational databases? Long live NoSQL? This debate is far from over, and both camps have a long and healthy product life cycle ahead of them. Expand your horizons and don’t be afraid to test your assumptions regardless of which camp you’re in today.