If I’ve learned anything in the 10 or so years I have been faithfully patronizing SharePoint it’s that it is all about the experience. It really has been one of the best solutions ever devised to connect people with content. So much so that companies have been connecting more people with more content than we ever thought possible, and that’s a good thing… sort of.
If you ask just about any IT manager what keeps them up at night, the first thing they say after security is content growth. If we then look at what I’ve just said about SharePoint above it is plain to see that SharePoint is possibly one of the biggest factors in the sales of sleep aids to IT folk. With that said you can’t really argue that SharePoint is a catalyst for content growth. But SharePoint itself is not the issue, it is an enabler, but the issue itself is when we don’t prepare ourselves to manage the deluge of content coming into SharePoint.
All content may be created equal, but it certainly doesn’t exist equally throughout its lifecycle. Some content is business critical, some is purely personal, some is connected to a series of workflows in a process and some is destined to get pushed directly into an archive, some files have a large footprint and some are measured in bytes. And in some cases content can share a number of these attributes all at once. My point is that we need to think about the content we’re putting into SharePoint, as in a number of cases above, it might not actually be the right place to store it.
When you are a larger organization leveraging SharePoint with hundreds or thousands of users creating hundreds if not thousands of documents a month you may want to familiarize yourself with two words: storage optimization. What these two little words can bring to you is three-fold: performance, scalability, and cost savings. Did I mention cost savings?
If I’ve seen it once I’ve seen it a hundred times, large organizations deploy SharePoint with little to no policies in place as to how many sites can be created, or who can put what content in those sites. Now outside of the glaring governance issues (which I’ll address in my next post) the biggest issues arise when the quantity of content coming into SharePoint outpaces our ability to scale out the server farm to support the new volume of content. What happens next?
A great example comes from an experience I had with one organization’s deployment and that had to do with search. This company had thousands or large image files being stored in SharePoint and when it came time to access this content there were some challenges. Search times for result lists within SharePoint from simple keyword queries were in the 30 to 45 second timeframe. Remember 1997 when we all used 28.8 KiloBaud modems? Same experience. And that’s what happened to this company when they overloaded SharePoint with content without mapping out a plan for storage optimization. This same content overload issue can also have impact on back-up times and in terms of meeting service-level agreements.
Overcoming this challenge isn’t really all that complex; it just takes some planning and perhaps a little patience. The term I use is externalization and this process is simply taking a piece of content as it is being saved or stored within SharePoint and dividing it into two pieces: metadata and Binary Large Object (BLOB). Once you’ve done that you can shuttle off the BLOB to another more appropriate and perhaps more cost effective storage environment. The best part of this process is that Microsoft themselves has developed two APIs that help SharePoint users do this and they are the External BLOB Storage and Remote BLOB Storage APIs. These APIs in combination with a third-party developed handler can help you effectively re-direct these BLOBs to a much better place, while ensuring content transparency in SharePoint. At the end of the day you have lightened the load on the SQL Servers that support SharePoint, pushed up to 95% of the content’s mass (the BLOB) into a cheaper tier of storage and ensured that end-users can access the content in milliseconds, not many seconds.
The reason I said patience was in consideration of the many folk who may already be in the too much content in SharePoint pickle and will need to externalize all of their existing content. What you might consider doing is a back-up of all your data and in doing so drive all of the content that has yet to be externalized through the externalization process. There are MANY articles on how to do this out there, so just search it or check out MSDN.
Another challenge I like to point out in terms of storage optimization is dealing with old content. SharePoint is not an archive… I repeat… SharePoint, not an archive. If you are dealing with vast volumes of data chances are anywhere from 25% to 30% of it is inactive, meaning: it is just taking up some of the most costly storage space in your data center. SharePoint is an active content solution and in being so requires high I/O servers to support it and what it does. These servers are typically not the most cost effective places to store content that is doing nothing… that’s why we have tiered storage. Moving older content out of SharePoint not only makes more room for the new active stuff, but it also reduces the load on SQL and puts the content into a much more cost effective storage environment.
As an important side note, when you are thinking archival, do make sure that whatever solution you choose has a Web Part, or provides search capabilities to the archive, through SharePoint so that if someone (with permission of course) wants to access older content they can still easily do so.
In a nutshell, storage optimization for SharePoint can bring performance, scalability, availability, and of course cost benefits to your organization, whether you leverage content externalization, archival or my favorite, both.