By Andrew Chapman
When SharePoint 2007 first started being widely adopted, the question of whether it could be used to replace existing ECM solutions was often asked. The general consensus was that SharePoint 2007 was not necessarily the best choice for the corporate ECM backbone. However, when SharePoint 2007 is coupled correctly with a traditional ECM solution the two systems complement each other extremely well.
With the imminent arrival of SharePoint 2010 that same question is bound to arise – do we need SharePoint 2010 and our traditional ECM solutions? Let's walk through the content management-centric features of SharePoint 2010, see what works, and look for where there is still room for improvement.
SharePoint 2010’s ECM Focus
Not surprisingly, SharePoint 2010 builds out in a lot of the areas that SharePoint 2007 saw success; enhancing developer-related functionality, building communities, integrating search, supporting business analytics and the ubiquitous Office integrations.
The Microsoft SharePoint development team also addressed areas related to content management. This article highlights the key content management-related features in SharePoint 2010 and questions how well they might address enterprise requirements.
Library Services: Managing Content in Large Scale Deployments
SharePoint 2010 does address some issues related to how to handle large volumes of data – the removal of some list size limitations and support for remote BLOB storage providers in SQL Server for example. However, rather than focusing primarily on the back-end scalability of the system, it seems that Microsoft took a more user-centric approach to this challenge.
It would appear that they asked the question, “If I have huge volumes of data, how might users interact with that data?” Many of the features highlighted in this article relate to tagging, finding, and interacting with specific content which supports this assertion.
One of the single biggest challenges in enterprise content management has always been mapping content with relevant and accurate metadata. If you accurately tag content in your ECM system, there's a reasonable chance that you’ll be able to find it again later.
If you don’t, good luck wading through hundreds of thousands of documents looking for the one you wanted. SharePoint 2010 has focused a lot of effort in this one area. They have introduced Enterprise Managed Metadata services, location-based metadata defaults, unique document IDs and content ratings.
Enterprise Managed Metadata Services
The Enterprise Managed Metadata (EMM) service allows you to create one or more centralized libraries containing content types, terms and managed keywords. These can be published to sites across your farms and also accessed via web services by third party applications.
For example, you can push object types out to all sites across your farms where they can then be either used directly or inherited from locally as base types to create custom types. Note that keywords and terms don’t use the push model as they are used on-demand so they’re always current.
Included in the managed metadata are taxonomies (a formally managed hierarchy of terms) and folksonomies (unmanaged user defined terms). Interestingly, users can not only create folksonomy terms but they can also have those terms pushed out to the owner of a specific taxonomy for potential inclusion in the managed list.
If I were a corporate librarian, I might find this pathologically annoying or amazingly useful – I’m not sure which. If you fill out a field that is related to a taxonomy you get type-ahead assistance which not only makes it easier to enter values but it also cuts down on the number of erroneous errors that one tends to get in a folksonomy-based system.
Location-Based Metadata Defaults
This feature allows you to automatically set default values for metadata when an object is uploaded to a folder or a document set. For example, all documents in the ‘HR Contracts’ folder could have the ‘Department’ column set to ‘HR’ and the ‘Document Type’ set to ‘Contract’. These rules cascade down the folder structure which helps too.
Given that SharePoint 2010 is promoting a ‘navigate using metadata’ paradigm rather than a folder model this feature is very useful. For now, users could continue to use folders as their ‘filing system’ but behind the scenes the content is being discretely tagged.
Unique Document IDs
The idea that SharePoint 2007 did not have unique document IDs seems beyond comprehension. How was SharePoint able to track which object was which without this fundamental identifier? It turns out that SharePoint 2007 did use unique identifiers but only internally.
SharePoint 2010 externalizes the use of unique IDs. Although you can enter these IDs directly in to the UI and bring up a document they are better suited to being used either programmatically or embedded in hypertext links; here’s an example of one just so you understand why: 7K3W6YVEA2YC-17-1. You can define naming schemes so a document can be identified even if it moves between site collections.
One thing missing from the unique IDs is that each version of the document does not have its own ID. Ideally you would want to be able to address an individual document in the version stack not just the entire stack.
With this feature, each document in a list has a number of stars shown alongside it and users can rate the content. Later you can sort the documents by popularity. At first sight, this feature looks a bit like "a gimmick looking for a home" rather than a true enterprise content management feature, but after using it I realized that it is actually quite useful. Assuming that the people you share content with are like-minded, the democratic approach to deciding which document is most appropriate is amazingly accurate.
Document Sets should be something to get excited about, but alas they are not quite everything that they should be in this first iteration. The general concept of a document set is that it allows you to group a set of documents together and treat them as a single object.
They share a single set of metadata, can be versioned as a group, downloaded as a single Zip file and routed through a workflow as a group. The set also includes a Welcome Page that users see when they select the set. This page allows you to add some context to the group of documents. So what’s missing?
* A document can only be in one document set at a time. The concept of related documents is often used to represent a compound document, in this example it would be reasonable to have a common piece of content shared across many the document sets – a standard set of terms and conditions for example.
* Documents do not have an explicit order within the set. Often the order of the component parts of the document in a collection is critical to whether the end result makes sense. You could order the documents implicitly by keeping the object names ordered alphabetically but that’s a little 1970’s for a product with 2010 in its name.
* If you perform a search and find a document you have no way of knowing that the document belongs to a document set. From a contextual and compliance perspective this might be undesirable because the document’s inclusion in the document set is almost certainly relevant.
* The ability to take a snapshot of the specific versions of documents in a document set and protect that snapshot is missing. This model allows you to freeze a copy of the documents as a record but then continue to update the documents in the live set and is often used in records management.
Advanced Content Routing
In SharePoint 2007, content routing was primarily a feature of Record Centers; SharePoint 2010 extends this functionality to all sites. With content routing you establish rules that determine where a document will be routed to based on the metadata of that document.
This feature allows you to create "drop off" areas in your sites; content can be dropped in to these locations and the rules will then file the content away in a predetermined location. Assuming your users tag their objects correctly this feature can save a lot of mistakes.
We are probably all aware of the new ribbon-oriented UI in SharePoint 2010 but there are some other more subtle UI changes hidden away in SharePoint 2010 which directly affect how you will interact with unstructured data.
These changes include Metadata Driven Navigation, advanced targeting, better business data access, RSS feed monitoring, key performance indicators, summary links and searches. Let’s consider the first two of these as they are the most relevant to this article.
Metadata Driven Navigation
If there’s one feature of SharePoint 2010 that encapsulates the approach that Microsoft have taken to large scale ECM deployments it is Metadata Driven Navigation. Being able to store 50 million documents in a single SharePoint list is very impressive but unless your users can find the ‘one in 50 million’ document that they need then you’re not really being successful.
Metadata Driven Navigation allows managed metadata fields to appear in tree-view controls on the left-hand side of the navigation pane. Users can then select values for any field to constrain the results. For example, "show me only final versions of contracts."
Behind the scenes you are really adding constraints to the search predicate and seeing the results in real-time. I’ve heard people refer to this as "zero typing search," which makes sense. You can elect to add these managed fields to the index which seems like it would be a good idea.
So why is this feature so useful? Two primary reasons, firstly it allows you to remove huge swaths of irrelevant documents and hone in on the ones that you need and secondly it is familiar – most online shopping sites use this approach.
Along the theme of getting access to content quickly and easily, SharePoint 2010 introduces the ability to customize page views based on membership of audiences. Within Central Admin you can set up rules which specify who will be a member of specific audiences.
You can then specify which Web Parts, items and links will be visible on a page based on those audiences. This allows you to better target specific content to specific users within your environment.
SharePoint 2010 includes a rich desktop client called SharePoint Workspace, (a replacement for the old Groove product). There are a few ways of looking at this client; its primary use case is to allow you to work offline by caching a copy of documents and data to your local machine.
Workspace can cache an entire site’s content including custom lists, line of business data and even InfoPath forms. If you edit content when you are offline Workspace will push those changes back up to the originating site when you re-connect.
Workspace can also be viewed in a couple of other ways. It gives an alternative view of the SharePoint site’s content even when online; some users might prefer this non-Web interface. It could also be utilized as a solution for accessing your SharePoint sites in environments that have connectivity challenges. The synchronization of content in both directions is asynchronous so you can make changes to documents and then let the systems synchronize those changes in their own sweet time.
Office Web Applications
Microsoft is making a lot of noise around the new web–based versions of Word, PowerPoint, Excel and OneNote. For sure, these new applications fall into the unstructured data camp but right now the best application that Microsoft seems able to come up with is to interact with the Office documents “when at a shared terminal at a conference or at a coffee shop” . I’m sure that there are a multitude of better applications – if you know of a real one, contact me at my blog.
SharePoint List Item Limitation
In SharePoint 2010 you can have up to 50 million items in a SharePoint list. This is a really bad idea; seriously you should probably look for a new system architect when the number of items in a list is six digits long, ever mind eight!
That said, your lists will grow and SharePoint includes some interesting ways of limiting how large a search query can get so you don’t have users taking down the system by entering too broadly bounded queries. You can limit how many objects in an end user’s query results and you can then allow admins and processes to bypass that limit. You can also allow the limit to only be broached within certain times of the week.
Not Creating Unstructured Content in the First Place
Most of us tend to open Word when we want to type something, Excel when we want to calculate something and PowerPoint 10 minutes before we are about to present something. When one looks at many of the Web 2.0 features of SharePoint 2010 you realize that perhaps we should give more thought to our choice of tools.
SharePoint 2010 has broad support for the wiki paradigm; it has native blog management, activity feeds and discussion boards. I wonder how often we should be using these (semi) structured tools instead of just defaulting to creating a monolithic piece of content. Just a thought…
There are a plethora (well quite a few) other content-related features that are not covered in this article. Some of them are worth further investigation. I suggest taking a look at SharePoint 2010’s record management, the new web content management capabilities, all of the Search features (especially ‘refiners’ which allow you to quickly filter result sets), Business Connectivity Services, RSS feed monitoring, key performance indicators, and summary links.
Andrew Chapman is the Senior Director of the SharePoint Technologies Group in EMC’s Content Management and Archiving division. He is responsible for setting the strategic direction and ensuring the timely delivery of products that facilitate the interoperability of SharePoint and EMC Documentum. He is the author of Never Talk when you can Nod, a guide to implementing ECM-based compliance systems and blogs about compliance, SharePoint, and ECM at http://www.nevertalkwhenyoucannod.com/.