Greetings from Berlin, Germany, where I am presenting at the European SharePoint Conference #EuropeanSP on Twitter).
Last week I promised to share with you information about the Keynote demonstration at Microsoft SharePoint Conference 2011—an incredible demonstration in which Richard Riley walked the wire without a net and failed-over a SharePoint farm with 14+ TB of data and 7,500 concurrent users in just 40 seconds.
This was a demo that exceeded expectations and broke the rules and caused both excitement and heartburn in the community, and I’d like to address what they did, why they did it, and what’s so cool!
But FIRST, let me tell you about Berlin. It’s been phenomenal to return to this city. I first came to Berlin in October 1989, just two weeks before the wall fell. I walked much of the length of the wall on the West Berlin side, and I crossed at Checkpoint Charlie into East Berlin, to discover that all of the *really* cool buildings—including the glorious Pergammon Museum—were there, and I spent a fantastic day wandering bleak and fascinating East Berlin.
I was in the Soviet Union watching what turned out to be the very last November 7th Day Parade, and elsewhere in the USSR, when the Berlin Wall fell. Needless to say the event did NOT make the front page in the USSR, so I didn’t hear about it. I knew the political situation in Germany was volatile, and my parents had asked only one thing of me—don’t get involved with any revolutions—so I booked a train out of the USSR to Prague, specifically to avoid Germany. [Side note: Traveling into the USSR with a one-way plane ticket and thinking I could easily get out was a STUPID move at the time.]
So I arrived in Prague and discovered the wall had fallen peacefully and, as luck would have it, the next day was the Velvet Revolution in Prague, during which I literally marched on the castle with the protestors—one of the most spectacular experiences of my life.
I came back to Berlin 12 years later (2001) to find a skyline filled with cranes as one of the most vibrant cities in the world was reinvented to become what I believe will be one of the great cities of the 21st century. Now, another ten years later, Berlin is indeed such a city. I’ve really enjoyed tracing my steps from my past trips, seeing the extraordinary changes, and enjoying the many diverse experiences offered by this one-of-a-kind capital. On Sunday I treated myself to a rare afternoon off of work and went for a two-hour jog around part of the former West. You can see the photos here.
This coming Monday, the 24th, I’m conducting a workshop at the Microsoft offices in Berlin. I announced the date incorrectly last week—it is the 24th. The London event last week went very well, so I’d love to see some of you here in Berlin! It’s free and conducted in English, by the way, and you can register here.
So, about that SPC Keynote Demo, which you should not try at home…
Demo Illustrated Massive Scalability of SharePoint 2010
Richard Riley, director of SharePoint, got the opportunity to do what I would love to do—pull the plug on 7,500 users. His demonstration was designed to illustrate the massive scalability of SharePoint 2010 and the high-availability story offered by the new Always On capability of SQL Server “Denali” (whose name was recently announced as SQL Server 2012), currently in its latest CTP release.
The demonstration was incredible, both for what it showed and because you gotta know that there was no “Plan B” for a demo this big. This was live theatre at its best, geek-style.
The demo used the same hardware, configuration, and dataset that Microsoft used to test SharePoint capacity and performance in order to determine the new scalability limits that were announced in July.
On stage was a set of racks with big iron and titan horsepower, loaned to Microsoft by EMC and NEC. An EMC VNX5700 SAN with 400TB of storage capacity was fronted by two NEC 5800 servers, each with 8 cores and 256GB of RAM running SQL Server Denali CTP3.
The SharePoint farm was entirely virtualized, on a "sick" (in the "amazing" sense of the word) server with 1TB of RAM and 80 cores—in other words, just like the server under my desk at home.
The virtualized SharePoint farm was also fairly massive, with 6 Web front ends (WFEs), 5 FAST index and search servers, 2 FAST admin servers, and 2 SharePoint app and admin servers.
The farm hosted just under 108 million documents in a single gigantic, 14.4TB content database. The database—and all SharePoint farm databases—were fully replicated to the second SQL server in the Availability Group—the new logical cluster unit of SQL Server "Denali" Always On.
The 108 million documents represented a variety of document types with content pumped in from Wikipedia, to ensure randomness. Documents were split across two document centers.
Sixteen test agents were running a load of 7,500 concurrent users which—at a conservative 5 percent concurrency rate—represents a total user base of 150,000 users in an enterprise. Richard pointed out that both SQL Server and SharePoint were "barely breaking a sweat" under this load. (Yes, it’s amazing what you can do with 256GB of RAM, isn’t it?)
A few people in the audience noted that, in the Microsoft Visio diagram of the farm, the content database used a SharePoint-default-style name, complete with a GUID in the database name. But luckily, when SQL Server Management Studio was opened, the name was more "human." Funny with all the crazy things we saw, that one little point raised a few eyebrows.
So with 7,500 concurrent users accessing 108m documents in a single, 14.4 TB content database, Richard’s assistant pulled the plug on the network to one of the two SQL servers. Yes, all of that farm was running on one cable.
But come on, it’s a demo… that’s kind of the point, to be able to "pull the plug" and have things fail! And within seconds, SQL Server had failed over the 14.4TB content database.
A few seconds later, and SharePoint failed over. PHENOMENAL! Now THAT is scalability and availability, folks! The crowd went wild… or at least as wild as 7,000 geeks can get.
Most Can't Afford to Scale at This Level
The demo did cause some heartburn in the community, however; particularly for storage vendors and ISVs who have storage products, like we do at AvePoint. The heartburn came as participants flocked to us asking how they could achieve similar results, and as they questioned us about the supportability of the environment that Microsoft displayed.
So let me address some of these points.
• This was massive iron. Yes, if you have a similar configuration you can do similar things. Please call me when you have these kinds of servers and these kinds of business requirements. But for the vast majority of us, we won’t see the opportunity to scale to this level any time soon.
Remember that Microsoft expects input output operations per second (IOPS) of between .25 IOPS to 2 IOPS per gigabyte stored, depending on the scenario and workload. So for most of us, we can’t afford to scale to this level. Even Microsoft didn’t buy this hardware—it was loaned to them.
• The workload is key. Microsoft’s new scalability guidance allows for content databases up to 4TB in collaborative scenarios, which means basically all scenarios except document archives. Document archives are defined as sites using the Records Center or Document Center templates with <5 percent of content accessed monthly and <1 percent created or modified monthly. For document archives, there is no supportability limit to content database size.
Richard did mention—almost as an aside—that the document center template was used and that the workload used in the demo was a combination of open, browse, and search, implying that there was no or little write activity. He did not mention whether the amount of content being accessed fell within the scalability guidelines. But let’s give Microsoft the benefit of doubt on this one.
• 108 million items were involved. What was a "mistake" in some people’s opinion was that the number of documents—108 million—clearly smashes through the supported limit of 60 million items and documents in a content database.
This 60 million item limit is a new one, introduced with the changed guidance issued in July of 2011. It comes from the fact that Microsoft has identified that “too many” items in a content database can impact the performance and even the success of patches, updates, and upgrades.
To ensure that updates can complete within a reasonable SLA, and that they can complete successfully at all, Microsoft has drawn the conservative line of supportability at 60 million items in a content database. The demo had 108 million items in two document centers.
As I recall, the supported limit for items in a document library is 30 million, so I am also giving Microsoft the benefit of the doubt that these documents were distributed across two libraries per document center.
I interviewed several of the product group later that day. All were thrilled with the success of the demo, and each shared with me that there literally was no “Plan B.” This was done without a net. Each had slightly different explanations for why Microsoft stood on stage and demonstrated an environment that Microsoft would not support.
But really, folks, I felt a bit guilty asking the question because, it just doesn’t matter. As one product team member told me, with a grin, "We did it because we could."
And that’s what it’s really about. SharePoint can do amazing things, and now Microsoft can stand on stage and shout about it to the world because they’ve had time to invest in testing the limits of the platform.
Is it surprising that Microsoft won’t support the outer edges of those limits? No. They shouldn’t, because outer edges are unpredictable by nature and support should be based on predictability.
So they draw lines of support that are conservative and, by definition, supportable. A demo is meant to be cool. TechNet is meant to be a splash of cool water in the face, and realistic guidance.
4TB Content Databases for Collaboration
So yes, you CAN have 4TB content databases for collaboration, and databases of unlimited size for archives. But you must have the required underlying performance, and you must have strategies and tested procedures for disaster recovery, availability, scalability, and future capacity. Many of these will require third-party tools—software and/or hardware based—to achieve.
For the time being, however, you can’t have more than 60 million items in a content database. As Microsoft’s guidance states, testing has been conducted (at the time of publishing) up to 60 million items, and after that you need another content database.
As the demo shows, Microsoft is not resting on those tests—they are pushing the limits—and my guess is that even that limit will be raised over time. But not yet.
You can see the demo in the recording of the keynote at the 42:00 minute mark. Get excited, folks!
Then read what you are allowed to do in the capacity management guidance. Just don’t “flame” Microsoft.
It’s fantastic that they’ve given us support for limits we will have trouble reaching, and then demonstrated that they know SharePoint can do even more!