Your SharePoint Content Corpus

“Storage is key to all SharePoint performance.” Russ Houberg is not the first person to say this. However, you know to pay attention when he says it. Not the least of which because he’s a SharePoint 2010 Microsoft Certified Master (MCM).

“You have to plan for a whole lot more than just the size of your content.” Common sense, right? Seems a lot of people don’t, though.

“People don’t understand that different files affect search differently.” Search and storage are intertwined with content. Here are some things Houberg says you might want to know about your content:

Are documents collaborative or archive?

Are they Microsoft Office documents? Images? PDFs?

Do they need to be full-text indexed during a crawl?

What is the average file size of each type of document?

Are any documents larger than 16mb? Documents larger than that are not crawled by default.

How many documents of each type?

How long did it take your organization to get to that amount?

Knowing about your content lets you create your organization’s corpus profile. Your organization’s content corpus is the size and shape of content stored in SharePoint or other content sources.

It’s different from other organizations’ because your organization is different, and it’s important because it will affect how much storage you need. It can be useful to create a corpus profile of how your content looks and what it’s comprised of.

Houberg showed one such corpus profile on a storage projection worksheet. You can see the worksheet in a zip file reachable from his blog post Scaling SharePoint Records Centers #spc11 #spc382.” 

Below is a snippet of the worksheet to give you some idea of its intricacy. With a sleight of hand and enough equations to make an English major faint, he showed how 3.5TB of content needs at least 7.5TB of storage.

Houberg figure 1 worksheet snippet_0
Houberg figure 1 worksheet snippet_0-Copy

Houberg has a post that helps to explain the latest storage guidelines the Microsoft SharePoint team blogged about at the Microsoft site.  Read Houberg’s post “New Content Database and RBS Sizing Guidance” at his blog.  Microsoft TechNet has a good article too, called “Capacity Planning for SharePoint 2010.” Houberg is coauthor of SharePoint Server 2010 Enterprise Content Management (Wrox).

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.