NoSQL is a topic that seems to pop up in every conversation about current SQL Server trends. When I was at TechEd in June, people were still wondering what exactly NoSQL was, and they were concerned about what it would mean for their jobs as DBAs and developers. Six months later, the SQL Server community seems to have a better grasp on NoSQL and the scenarios that it’s best suited for. At PASS Summit 2010, I had the opportunity to sit down with Quest Software’s Kevin Kline, strategy manager for SQL Server at Quest Software, and Jeremiah Peschka, emerging technology expert, to discuss the strength in the NoSQL market and how companies are deciding where to implement NoSQL rather than SQL Server.
Megan Keller: Kevin, when you and I spoke with Brent Ozar at TechEd 2010, we discussed the current trends around NoSQL and Azure. A lot has changed in the NoSQL market in the past six months, though. What are some of the trends you’re seeing in this market now?
Kevin Kline: Well first of all, in support of the trends we discussed earlier, let me introduce to you Jeremiah Peschka, [Quest Software’s] evangelist and technology specialist in new and emerging technologies. If that doesn’t validate where we think some trends are going, then I don’t know what does. Definitely we’re seeing quite a bit of exciting things happening with new and emerging technologies. Jeremiah does have a great deal of depth in development, as well as in SQL Server DBA work, but he also is experienced with all of these really strange sounding things like Hadoop.
Jeremiah Peschka: Sawzall.
Kevin: Sawzall. Voldemort. Cassandra. There’s really a NoSQL database called Voldemort. Lots of interesting things happening, and I’m thankful that’s Jeremiah’s area.
Jeremiah: I do think there’s a lot of strength in the market behind that. You’re seeing a lot of players like Google have started releasing a lot of their tools to the community. Things they’ve built up internally for eight, nine years they’re letting the community actually use now. Yahoo! has been contributing back a lot of the technology they developed to process 22 petabytes of data a day. I think as the amount of data we collect grows, it’s a matter of when you’re going to be switching to using one of these systems. They have a lot of strengths that complement where SQL Server doesn’t do too well.
Megan: Do you see companies implementing both traditional SQL Server systems and NoSQL, all within the same environment?
Jeremiah: That’s exactly what my research is showing; what I’m seeing when talking with people. You can’t get away from all the benefits that a relational database gives you. It’s a known quantity, we know how it performs. But at the same time, there’s a lot of benefit from using batch processing systems like Hadoop, NoSQL. There are areas where SQL Server doesn’t perform quite as well; you have to do a lot of tricks to get it to do things. Whereas with Hadoop it’s built for this out of the box; that’s exactly what it does.
Megan: Are you seeing specific types of workloads being used with NoSQL?
Jeremiah: Definitely. One of the workloads that I see a lot of is batch processing, like image processing. eBay uses NoSQL for a lot of bulk image processing. Yahoo! does a lot of raw analysis of data, and then they push it back into Analysis Services. Or, if you have data that’s very poorly defined, it has to be structured, that’s another good place to use NoSQL, where with a relational database it gets very convoluted. The New York Times uses NoSQL to do a lot of their form building for very loosely defined forms. So it really works well there.
Kevin: I think a really interesting question to look at, too, is how are the mainstream relational database vendors going to address this? There are a lot of different strategies you could take. You could build an extension to your existing product, you could build a brand-new product and try to launch it, you could build a toolkit to utilize an existing open-source kind of code—something like Cloudera has done where they’re building out a lot of offerings around Hadoop.
I’m really keen to see what Dave DeWitt is going to say on Thursday [during his PASS Summit 2010 keynote]. This time last year when he was giving his keynote, he said “I’m going to teach you a little bit about key value stores and column stores, but do not for one second assume that this means there will be anything related to it in any of our products, anywhere. So what did we see this morning, Jeremiah?
Jeremiah: That would be columnar indexing.
Kevin: A columnar indexing system, isn’t that interesting? So the major vendors recognize that there are simply situations where a relational database, by its very nature, has certain kinds of overhead. And that overhead means that we’re going to guarantee certain levels of service. For example, a transaction is either rolled forward and applied to the system or it is completely rolled back and doesn’t exist in the system. That has overhead; a great deal of overhead. It’s called the ACID property of transactions. We get to skirt all of those rules and all of that overhead with these other high-end systems that are NoSQL systems. So what do you do? Do you build in a NoSQL, no ACID capability, or do you offer a separate product, or do you try to leverage something that already exists out there? Not only are we watching Microsoft and SQL Server, but we’re looking at what is Oracle going to do; what is IBM DB2 going to do. Sybase is doing really interesting stuff.
Megan: Do you see third-party vendors eventually tying this into their products as well?
Jeremiah: That is a good question. Obviously, we can’t talk about future product direction, but I know that other vendors make extensions to MySQL and they’ve started building a lot of different products to go on MySQL’s backend. And I think the market really is too young to speculate what people are actually going to be doing.
Kevin: That’s one of the really interesting things about this broader scene is that it’s still the Wild West. It’s kind of like the turn of the century and the gold rush. We know people are going there, they’re trying to get something out of it, but who’s going to come out on top, we don’t know.
Jeremiah: At the beginning of the year, there was something like 27 different NoSQL database vendors on the market. Several more have come up, several more have folded.
Megan: Is there a NoSQL database vendor that stands out above the rest?
Jeremiah: Cloudera is making a lot of waves, whether or not they have marketing or they’re very, very successful, either way, they’re making a lot of waves; a lot of people are talking about them.
Kevin: I think the Apache implementation of Cassandra is definitely worth keeping your eyes on. Again, it’s still a little too broad to pick your winners, but there are certainly a handful of leaders. I think that one of the other questions that comes to mind is “What is Quest going to do?” Just to speak to that a little bit more, we are definitely observing it very closely, and we are doing some work in that space. We do have a free beta product, Toad for Cloud Databases.
Jeremiah: We also started up NoSQLPedia, in addition to SQLServerPedia and OraDBPedia, where we’re building up a community knowledge base. We have a couple of syndicated bloggers on board helping out with that. And we talk about not just traditional NoSQL databases like Hadoop, but we’re also talking about Azure table services in SQL Azure because a lot of people lump cloud in with NoSQL as well. We’re trying to get that information out there because it’s new and it’s different. A lot of DBAs are like “Is this going to take my job away from me?” Well, no, it’s still a database; you still need to be able to work with it. Someone needs to manage it and understand what’s going on under the hood.
In an upcoming blog post, Jeremiah and Kevin share their thoughts on the growing cloud market.