Microsoft's Doug Leland Discusses Hekaton, HDInsight, and SQL Server 2012 Service Pack 1

At this year's Professional Association for SQL Server (PASS) Summit in Seattle, Washington, I had the opportunity to talk with Doug Leland, General Manager of Product Management, Business Platform Marketing Group for Microsoft, to discuss some of the new SQL Server technologies that Microsoft had recently announced at the Strata and PASS conferences. (See also, "Notes From PASS 2012" and "PASS Summit 2012 Slide Decks").

Michael Otey: Let's talk about some of the new announcements -- Hekaton, what's going on with that; maybe a little bit about HDInsight and what's going on with that. Hopefully, we can inform our readers about what this stuff is all about.

Doug Leland: Well, let me start off with HDInsight, our set of announcements that we essentially made at Strata a few weeks ago. And I may even rewind the clock back to . . . actually back up into, basically, past last year, Microsoft, Doug Leland, SQL Server 2012, interview when we made our first announcements of our strategy we wanted to dictate our strategy going to Big Data Hadoop. We said in the past that we were bringing this to market a year ago at PASS and essentially we're bringing nonrelational and unstructured data into SQL Server as a first-class citizen as part of our data platform. And look at what that meant for us strategically. What that means for us strategically is, one, deliver an enterprise-grade implementation of Hadoop. Offering it on premise[s] and on top of Windows, integrate it with SQL Server, and also as a subscription-grade service offering on [the] Windows Azure platform. And when we talk about enterprise grade, I can frame it up a little bit, kind of block a little bit more detail of what we really mean by that. But that means bringing the simplicity, flexibility, and the security of Windows. All of what our customers enjoy today -- who enjoy the benefits of the Windows and Windows Server platform, bringing all that to what is essentially an open-source software set of projects (i.e., Hadoop) that has been built around a predominantly Linux platform.

Otey: Hadoop is very weird for a relational guy, you know -- MapReduce, and creating things like that. It's very foreign to SQL Server guys.

Leland: Very foreign; right. So it is fabulous to bring all that in and make it not foreign to core database people, people who are dealing with big data. So there's the bit around securities of integrating the platform with AD; there's a lot around management. So we're going to integrate it with our management stack, which means a combination of Windows Server plus System Center. We're going to provide Windows Server communities technologies with the ability to manage this entire environment with a single pane of glass. That's why we're using the management tools they use today to manage their virtualization environment, to be able to take those same tools, we also provide management packs for SQL Server to be able to use that as a single pane of glass for managing that entire structure of that entire platform. And then bring the value of all the data to all of the information workers, and to do that by plugging in our BI tool, which we also showed today, which is now fully integrated into Office, and Excel specifically.

Otey: Right; the new version of Office 2013?

Leland: Office 2013, which now has in it, natively, PowerPivot and Power View. So Excel 2013 really is a complete BI tool for information workers. And information workers don't need to go anywhere else for any other tool to do BI. Then take that power and plug that into Big Data. So that you can now use real-time insight across all databases -- whether they're relational or nonrelational. That was the strategy, and now we've been executing on that strategy. And now with that in your head, put into context the announcements that we made two weeks ago, and then today. At Strata, we announced essentially the next major product milestones for SQL Server as a service. So the Windows Hadoop implementation, which we now call HDInsight, will be available both on premise[s] or as a service.

Otey: Right; this is the new Windows version of Hadoop that you implemented either on-premise[s] or in Azure as a service?

Leland: Yes; so HDInsight for Windows is the on-premise[s] implementation that was announced at Strata. So now our customers broadly have downloadable access to preview bits, for Hadoop on a Windows implementation -- and a preview for the service.

The benefits of these different approaches are kind of like the following. First, we are the only vendor out there who is providing both on-premise[s] offerings and cloud offerings. So the customer has incredible flexibility in their ability both to choose whatever solutions that they need. If I want to run it in my data center, then I'll run it in my data center. If I want to run it in Microsoft's data center, I could now run it in [a] Microsoft data center. I can move it back and forth with the same implementation, 100 percent code compatible. And I can build hybrid scenarios where I may have a piece of the implementation running on data that lives in my data center; or I've got data that's been born in the cloud and I don't really want to bring it back on the prem, so I'll move it into the service. So the flexibility in terms of what we're offering is incredibly valuable to customers in terms of choices that we offer. The flexibility in the scenarios of my data center, the cloud, or some combination of both is also incredibly helpful. And then you kind of bring that back and say, "Now I can use the power of Excel 2013 to get to that data . . . incredibly valuable propositions.

The value of the service, of the HDInsight service, is simpler to the value proposition than you get for products. As a company, I'm not investing in the capital of the infrastructure to build out the software, and I don't have to buy a bunch of machines to build up a cluster. I don't need to hire a Hadoop expert to know how to deploy and build out a Hadoop cluster because I can go to the service, and with basically three clicks and ten minutes, I can deploy a cluster of any size. I pay for that by capacity, but again I don't need that implementation expertise. What I do need is the individual who knows how to build my jobs.

Otey: Right; because that is kind of the hard part in all this . . .

Leland: You know, ultimately, there are two hard parts. It turns out that the implementations that a lot of customers are consuming from the open-source community are actually pretty hard to set up in this way. So there is that hurdle. I would say the valuable part actually is, once you've gotten your cluster set up, you've got your data loaded in it, the valuable part is when you can set your data scientists or statisticians or business analysts loose on that data actually to derive some insights out of it.

So with the service we take all that complexity out of the IT world, and we take it out to a significant degree for the server because we are focusing on the manageability of the system, setup of the system on a Windows platform, thereby bringing the simplicity of Windows to Hadoop. So extract that complexity out of the set so that customers can really get to what they want to do, which is deriving insights across them. And then doing that either using the traditional tools that scientists are using on top of the Hadoop platform like R as a language, which is available and we're supporting, or bringing this power of democratizing the power of this data for the broader [Information Worker] by plugging in Excel into that picture. You can basically do queries across big data in HDInsight using Excel.

Otey: That's good to know.

Leland: So those are the announcements there. Let me take it forward to what we have today with specifically PolyBase. This in many ways completes a very significant arc that we've been on in terms of bringing together relational and nonrelational data. With PolyBase, you can, as we showed in the demo today, you can create a single query, which joins information stored in SQL Server -- that'd be relational -- and joins that information with information stored in HDInsight.

That was in the in the demo that we showed today. So that shows you a couple of interesting things. One, it's delivering on the strategy that we talked about, which is bringing this into the platform that Microsoft offers. Two is being able to unify, in this single query, structured and unstructured data. And then three is empowering the SQL development team; they can now use key people, and their people, to access data now over Hadoop. So now, if you know SQL, you know Hadoop, and you don't have to get into the guts of understanding how to build MapReduce jobs.

Otey: Ah . . . that's very nice! That really opens it up for SQL Server developers. And you're saying that's your standard SQL. Are you referring to T-SQL, or a variation of that?

Leland: T-SQL; yeah. Because here you use T-SQL and all the extensions that you can use with T-SQL in order to fulfill data access for both relational and nonrelational data. And it'll join the information up, which is incredibly valuable.

Otey: Right. That really opens it up because it's nice to be able to have access to the data, or to have the platform out there to do it. But if you don't have the tools that can help you get into it, or if the tools are unfamiliar, then that kind of makes it hard. So adding Excel 2013 and then T-SQL to be able to do that really bridges that gap.

Leland: So all these things kind of come together like here, right? It's kind of interesting to see the pieces all start coming together: bringing in the Hadoop piece; at the same time, building and delivering the query engine, which allows you to query off a Hadoop data set; and then now that we have Office 2012-13 with Excel, you would be capable of being able to now do queries over all that data -- this benefits both the end user and developers who can now use T-SQL. So you've really created that virtual cycle.

So then the other conversation is really in what I would describe as an adjacent complementary area, which is all about business acceleration through in-memory information access.

Otey: Oh, right; yeah. The new Hekaton.

Leland: Yeah. So that is, I would say, in many ways equally as exciting, and also the completion, or the next major step in the strategy that we've been on for quite some time. When you lay out the in-memory strategy completely, and Ted [Ted Kummert, Microsoft Corporate Vice President of the Business Platform Division ] did a bit of this today and I'll frame it up in more detail; we believe we have a couple of important principles to understand. First is that in order to have truly a strong in-memory solution for customers, it needs to have a couple things. One, it needs to support all your workloads; so all your database workloads, whether it be for traditional data warehousing workloads, you need to have the integration capabilities there. Whether it be your analytics workloads running analysis services, you need to have that in-memory capability there. And now, in your transaction-processing scenarios, the there's streaming, so there is a set of scenarios around -- I don't know how familiar you are with complex CEP technologies; or complex event-processing technologies . . .

Otey: Well, I know what StreamInsight is . . .

Leland: There you go. So StreamInsight is a CEP [Complex Event Processing] technology; so take super high-velocity applications like stock trading systems, where the information is coming in so fast that it actually can't even touch disk. That stream has to be processed completely in memory as it's coming in. And that's essentially what the CEP does with StreamInsight specifically. So covering all those forward, from data warehousing, to analytics, transaction processing -- to high-velocity transaction processing is the way to think about it, that you need to have a complete range of offerings. So that's principle number one. Now principle number two is that we believe it needs to be built in to the core data platform versus a separate app you bolt on top because whenever you bring in something new and different, you bring in cost and complexity.

Otey: Right; that's kind of the same strategy with the original OLAP services with SQL Server 7 incorporated into the core database engine. And it really allowed it to be accessed by a lot of people who otherwise wouldn't have bought into it.

Leland: Exactly. To manage it at that scale, and developing it -- both principles kind of apply to this one as well. So bringing it in to the core platform -- not introducing a new API or any language in order to access it. Because literally, if you know SQL, you know Hekaton, which I thought was a big plus. Because if you know how to program in SQL and T-SQL, you can program, do your business logic, with Hekaton. You don't have to learn any big new language. That is not the case for some of the competitive offerings that are out there. The competitive offerings that are out there are specific point solutions for a particular problem. I've got an analytics problem, I've got an analytics appliance. Oh, and by the way, has a completely new interface language; it has a completely new programming language in order to get to it. So you have to bring in the complexity of having to manage a new thing -- the IT complexity. And you bring in the developer complexity of having to build up an understanding and skillset as you develop the language. Our approach is the exact opposite. Build them in, so it's deeply integrated with what you already know and what you have.

Otey: Right. Hekaton takes advantage of what you see in the big advances of today's hardware processing capabilities, too, and the memory capabilities that they have.

Leland: Absolutely. It will exploit what you have. You're not pushed into consuming any more economy. So you're not buying another piece of hardware to bring it into your IT department. You can take the SMP box that you're running, your transaction processing back end -- classic high-end transaction processing, and you're running in a HP DL-98, right? So you put that on an 8-way box with a bunch of RAM in there, and then you're cranking away. You've hit the top end. There's nothing you can do anymore to tweak the performance. You can take, hopefully, PowerPivot and that new SQL Server with Hekaton enabled, land it on that same box. You don't have to buy different hardware. Pick your hot tables and move them up into memory, and all of a sudden you're looking at some 10x, 20x, 30x performance increase along the way, without changing your apps, and without changing your hardware. Now if you want to start optimizing your hardware, there are a lot of great things you can do. You can plant more memory in, which is kind of an easy win because you can move more stuff up there, more stuff up into memory.

Otey: You can move more data up there.

Leland: But the big difference between the other approaches is the fact that they are, for Oracle and SAP for specific examples, are requiring the acquisition of a pure appliance offering with the hardware-software coupling.

Otey: Oh, really? That's how they're doing it, through an appliance?

Leland: Exalytics and HANA are both only delivered as appliances, which brings in that cost and that complexity. And then there's the app migration. What we're basically saying is we've designed it in such a way that you can migrate your system applications completely; there's no rewrite. That's not the case when you're bringing a HANA into your environment. You're talking a rewrite and we feel that that's a big blocker to adoption.

Otey: Oh. That certainly would be. Obviously, it might be worth it to get the performance that you need; but if you can get it without having to do anything different in your application, you're already there. That's awesome.

Leland: So let's take you back to what's shipping and then what's coming. There are some nuances in all this. So with SQL Server 2012 that shipped back in April. [General availability] on April 1st; we delivered in-memory solutions for data warehousing and for analytics. And those are called xVelocity in-memory column store for data warehousing, so that would be column-store technology that we demonstrated today. And then there's the xVelocity analytics, which provides key memory for SQL Server Analysis Services. So now both the relational engine and the analysis engine both have in-memory capabilities built in. If you get 2012, you've got in-memory BI, and you've got in-memory for data warehousing. Now with Hekaton, we complete that picture.

Otey: Is any of what Hekaton is built on using the Vertipaq data compression technologies?

Leland: It's interesting in that when you're designing your transaction-processing system, versus designing your analytics, there are very different design points. So while there are shared algorithms for memory optimization, they're not really based on the same Vertipaq technologies. Vertipaq was really built around two to three fundamental things. One is column store; so the ability to store data in columns. Then two is the massive compression that you get out of it, the 10x compression that we have today and then three is the in-memory. So it's those three pieces when you think about the Vertipaq technologies, that we call xVelocity. Those are the three things. We've borrowed the in-memory capabilities and rendered those for Hekaton, but you wouldn't typically store your transaction-processing system in a columnar structure. You'd stay with a highly optimized store for inserts, updates, and deletes. And you normally don't compress your data. So we render where we can, but the design points really wouldn't want the same thing.

Otey: Sure; got you.

Leland: So that's kind of the in-memory story and Hekaton. So those two are shipping. Hekaton will ship in the next major release of SQL Server and has gone into what we call the private reviews. So between now and the end of the year we're going to be recruiting about 100 customers to participate. And these will be field nominated, as we can do at this stage of the cycle: customers who we have identified who are field contracts who have applications that would be well suited to helping us continue to build out and test out and validate and phantom test the capabilities of the architecturing along the way.

And then the other thing in the news we also announced today was the [general availability] of SQL Server 2012 and Service Pack 1. And that really is all about making Excel 2013 the complete BI offering because it provides the native integration of Power View into Office. So you open up Excel, now it's in the ribbon, and then you start doing BI, and then you write into a Power View workbook.

Otey: So the PowerPivot and Power View capabilities are built right in to Excel 2013? They're not an add-on anymore?

Leland: No. Our engineering team has been doing a lot of work with the Office engineering team on the strategy and on globalization. And our joint strategy is to deliver, to have Excel be the complete BI offering, full stop. There is no reason for a BI user to have to leave Excel or go to a different tool from a different company. They can do everything that they want to do, across any kind of data, Big Data or small data. And they can do that right within Excel. That's our strategy. And this is a pretty big step in achieving that strategy. We've moved from an add-on to really natively integrated, providing rich BI capability.

Otey: Yeah; that makes perfect sense. You kind of expected that would be the way it would go because, you know, they were developed separately, and then the add-ins were added into the initial package; but then as the product evolves you would want it to be natively incorporated. So that makes perfect sense.

Leland: Ideally, yes, yes. And that's an important element of our strategy as we talked about it: bringing BI to the masses. Again, it has been one of those capabilities that has traditionally been narrowly available to users within most organizations because of the fact that it has been predicated on having to acquire and learn some exotic tool.

Otey: Sure; yeah. I had one question, and that was about PolyBase. What are the requirements for using that when you connect a SQL Server system, or you're trying to use queries with PolyBase and you're going to a SQL Server system, you're going to a nonrelational system like the HDInsight system. How is this implemented?

Leland: Well, there are a couple considerations that you would want to take into consideration. Because what we showed today and what we're delivering in market first is that PDW is the first to receive PolyBase.

Otey: PDW 2012, right?

Leland: PDW 2012 is the first to receive PolyBase capabilities. So the first architectural decision that a customer is making is, "Hey, I'm building up my data warehouse, and I want to have that capability." And one of my architectural decisions is to acquire and implement our enterprise data warehouse appliance software. That's kind of at that point. And that gives you your rack of MPP SQL servers and away you go. Then the next design consideration is, "How do I want to implement Hadoop?"

Ideally, HDInsight, and it's on Windows or it's in Azure. So let's say let's take the BI on Windows phase. I would acquire software and then spec out my Hadoop cluster. Decide how many nodes I want and then the next consideration is your interconnects because you want faster interconnects between your two clusters. So your design consideration becomes then, "OK; what's the faster interconnect to choose for connecting my two clusters?" And the fastest interconnects that you can find will probably [depend] on what your performance needs are, then your data volume, and that kind of thing. Because that will commit both the characteristics of the cluster that you build in terms of number of nodes and discs and horsepower (i.e., CPU), combined with what you have in your PDW environment, and the networks that will ultimately determine what kind of data loads that you can handle along the way, and what kind of performance you're ultimately going to get out of the system once you've got it put together. Those become the high-level design considerations.

Otey: All right. Well, that sounds pretty good. I think that's really all I have time for; I really appreciate you taking the time to talk with me. Great information; a lot of good stuff. Thanks!

To learn more, see "SQL PASS Summit 2012: Day 1."

Comments

Plain text