Greetings from London, and welcome to the last SharePoint Pro Update.
Thanks to the Mayans, we are duly forewarned that within a few hours, the world will end. Before it does, I’d like to return to the topic of shredded storage, to update you on news from the ISV and customer communities about this potentially high-impact new feature.
Where to Learn About Shredded Storage in SharePoint
In September, I worked with my colleagues Jeremy Thake and Randy Williams to craft an article, "Shredded Storage in SharePoint 2013 Preview," that has, according to our analytics and to comments from readers, become quite a useful resource. Our goal with that article was to be accurate with what we knew at that time, so that we could clarify some of the noise and confusion around the feature.
Since then, SharePoint has reached the RTM milestone and there have been several contributions to the community about shredded storage. I’d like to point out, in particular:
Bill Baer’s excellent blog post that details the goals and technical functionality of shredded storage
Jeremy Thake’s article previewing the results of testing performed by AvePoint and NetApp
In addition, organizations like NetApp and AvePoint, and customers who participated in the TAP (Microsoft’s early adopter program) have performed testing.
What You Need to Know About Shredded Storage
Unfortunately, as I’m learning from all of these players, the more you know about shredded storage, the less you wish you knew about shredded storage. It doesn’t seem to be a pretty picture. So this week, I’m going to share my current thoughts and guidance about shredded storage.
While I will certainly strive to be as accurate as possible, my goal this week will be to open up discussion with the community (and with Microsoft in particular) by revealing what seem to be some very big “question marks” about the feature. I’d like to solicit your experience and thoughts, and particularly any quantitative testing you’ve done. And I’d like to encourage you to help Microsoft understand what you need and expect from SharePoint’s storage tier.
What you need to know (in sum) about shredded storage is the following:
Shredded storage was designed by Microsoft to reduce the File I/O of updates to Office Documents (XML format: docx, pptx, etc.). It does that, successfully. That was the goal of shredded storage. It hit the mark. See Bill Baer’s post for details.
Shredded storage results in all documents (regardless of format) being shredded or “chunked” or “paged” into smaller bits. Documents are no longer stored as a single BLOB (binary large object), but now are a set of chunks/pages/shreds/streams that are coalesced into a single document when retrieved. The size of the shreds is 64k by default, and can be configured using the FileWriteChunkSize property, by using Windows PowerShell, for example. Testing shows that chunks are not exactly 64k, but range in size around that mark.
Shredded storage has the potential to reduce storage footprint because as a document is updated, only shreds that have changed are saved to the new version of the document. Technically it’s not the same, but conceptually, it is similar to de-duplication or single instancing. It applies to multiple versions of a single document. There is no storage savings if a document is saved in more than one location, or if versioning is not enabled.
Similarly, if a user updates only metadata for an Office document, and not the document itself, there is a storage benefit. In SharePoint 2010, updating a column in a document library (which effectively creates a new version of the document) creates a copy of the document BLOB even if the document itself was not updated. With shredded storage, there is no change to the document, so there are no shreds added. Only metadata changes. That’s good.
Concerns and Questions About Shredded Storage
Now, some of the concerns and issues about shredded storage that are being raised:
Shredded storage seems to increase storage footprint in many scenarios. Everyone who has shared stories with me reports that the total storage footprint of a document often increases, at least initially. The first version of a document takes more space—sometimes significantly more. Storage savings kick in on the second, third, etc. version of Office documents.
The shredding of non-Office XML document formats leaves a lot of question marks as well. Office XML documents seem to work as advertised. But every other format, including older Office formats (doc, ppt, xls), PDFs, images, audio, video… everything else… is questionable. I’m getting very mixed feedback as to whether the shredding is efficient, how much storage savings is being seen, whether new shreds are generated if only metadata is changed, etc.
Based on these concerns, you can certainly identify some scenarios where shredded storage warrants close scrutiny, and might in fact be detrimental to the ability of SharePoint to support those scenarios.
There are also concerns about shredded storage and RBS. While the two features are technically compatible, they are not necessarily an optimal combination from a storage optimization perspective.
The storage reduction achieved by shredded storage is not optimal, compared with hardware-based de-duplication. Many customers use SANs for SharePoint storage. SANs and other storage platforms can offer de-duplication at the hardware layer, which can be leveraged by SharePoint with Remote BLOB Store (RBS). Hardware-layer single instancing will perform faster, and reduce storage footprints when documents (or similar documents) are saved in multiple locations—not only across versions. Shredded storage effectively negates the benefit of hardware-layer deduplication.
Some customers use RBS to achieve hierarchical storage management (or “tiered” storage management) to support information lifecycle management. Shredded storage muddies the picture significantly.
And perhaps most worrying:
According to tests performed by NetApp and AvePoint, access time to documents increases—again, sometimes significantly. These results are at odds with Microsoft’s claims that reconstituting the document shreds into a single document when the document is retrieved is “negligible.” There’s a big gap in results there that requires more of the community to test and pitch in their results.
Now here are the biggest doozies, in my opinion:
Documents are not shredded on upgrade. They stay in a single BLOB in the content database. They are shredded on write or update. That means you might upgrade today, and not see any ill effects of shredded storage, then, over time, performance could degrade and storage footprints could increase. It also means it’s much more difficult to test the effects of shredded storage on your environment. You basically have to touch every document in your upgrade lab to get them shredded.
In the RTM version of SharePoint, you cannot “disable” Shredded Storage. It’s on. Done deal. No choice. That means if you have scenarios for which shredded storage doesn’t make sense—or even penalizes your storage performance or footprint—you’re out of luck. It appears that you can mitigate the effects of shredded storage by setting the FileWriteChunkSize (the size of the “shred”) to something huge—like 1GB—so that documents only get one “shred”—but the feature is still on. More research left to be done there.
Concerns About Configuration of the Crucial Storage Tier
I’m really concerned, based on what I’m hearing, that we now have a take-it-or-leave-it configuration of the most important tier for SharePoint performance—the storage tier. I see customers already fighting to figure out whether scenarios including video libraries, AutoCAD stores, image repositories, podcast libraries and archives will be helped or hurt by shredded storage. And those customers who aren’t aware of the feature might find themselves finding “down the road” that the feature is not ideal for their scenarios.
At this point, I can say I “understand” Shredded Storage, from a technical perspective, but I’m not the expert in how it affects you in the “real world.” And it doesn’t seem that Microsoft has received (let alone released) the right kind of information either.
I really wish that more testing –or at least testing results—had been published by Microsoft to assuage the concerns of customers about a feature that would be touted as a “win” in the SharePoint Conference Keynote. I cannot believe I’m the only one who is getting reports from TAP customers about storage footprints increasing in production, or concerns about slower access times being seen in labs. It makes me wonder if customers’ diverse scenarios were considered—if the “right questions” were asked. That doubt is troublesome in itself.
Is Shredded Storage a Case of Unintended Consequences?
Shredded storage at this point seems to be a classic case of unintended consequences. It reduces File I/O for Office Document formats. Great! I’m sure that means something in Office 365. But there are many, many customer scenarios where it is not the bottleneck, and the feature that fixes it might introduce new and potentially significant problems into those scenarios.
Therefore, more than anything, I wish that a change in such a critical tier could be “opted in to.” The fact that the feature can’t be disabled seems like a bad decision or a big mistake—certainly not the outcome of a thoughtful consideration of customers’ requirements. And I’m a big fan of SharePoint and of the product team, too. They do great work. But it’s difficult to defend the decision to release the feature “always on.”
Test, Test, and Test--And Give Microsoft Feedback
I know I’m stirring the pot here, and I apologize for that, but I think what’s necessary now is for customers—particularly those largest customers that Microsoft listens to (the kind that managed to keep public folders in Exchange for a decade!)—to dig into this feature and test the heck out of it, and to provide feedback. We need to hear more from storage vendors and ISVs. We need to hear more from customers. What I’m hearing so far are credible concerns and initial results of testing. We really need more!
Microsoft’s team is smart and talented, but they have to prioritize the issues they tackle. So if the community really wants Microsoft to understand and, at a minimum, provide a way to turn off shredded storage for certain scenarios, we need to ask with many, loud (but constructive) voices.
But as important as SharePoint may be to us all, I’d like to close out this column by adding my voice to those that are mourning the loss of 26 innocent lives in Newtown, Connecticut last week. What happened at Sandy Hook Elementary is tragic beyond words, and it touches us all.
I grew up in Littleton, Colorado. Columbine High School was the closest high school to my home. Recent days have opened wounds from Columbine, from Aurora this summer, and from and other senseless and incomprehensible acts of individuals, of groups, and of nations.
My heart is heavy, today, and my thoughts are with all of those who have been touched by these events. While it is challenging to be optimistic in times like these, I hope as we continue through the holiday season you are able to be with the ones you love—either physically or spiritually—and have the opportunity to be grateful for the gifts they give you every day.