Federated Search in Microsoft Office SharePoint Server (MOSS) 2007

In most corporate environments today, small and large, information is scattered throughout. It can be found in e-mail storage systems, file shares, desktop computers, laptops, mobile devices, databases, etc.; the list goes on and on.

There was a point in time when the technology industry made attempts to drive information consolidation and centralization. Though the idea was good, it failed for many reasons; i.e. costs, culture, system integration issues, network bandwidth, etc. The industry has now accepted the idea that information may be stored in various disparate locations for good reason. The challenge has now become, how we find and present information, in a unified manner, to a user in which it is useful to them.

The term Federated Search is broadly used to describe a process that will transform and deliver a query to multiple disparate information sources, aggregate the results, create a single linear list of results (with minimal duplicates) and present those results to the user. From a macro level, Microsoft Office SharePoint Server (MOSS) 2007 approaches this challenge using a little different method; however, it is still considered a form of Federated Search. First, MOSS is configured to crawl disparate content sources and store the indexed results. Then, when a user performs a search, they are actually searching the result of the last crawl process. This method is optimal for many reasons (not limited to):

Availability of a content source. If, for any reason, a content source becomes temporarily unavailable, it will not have a negative impact on search results.

Network bandwidth. In most large corporations, content sources will be located in various demographic areas; delivering a search query (and waiting for response) can be time consuming and place unnecessary load on network resources.

By default, when MOSS is initially installed and configured, a content source for the first Site Collection is added. However, you are not limited to simply indexing the SharePoint Site content. You can configure additional content sources including web sites, Exchange Public Folders, file shares and data from the Business Data Catalog (BDC).

The benefits of indexing web site content is something that may not immediately come to mind. Let's take for example a Law firm. There are literally hundreds (if not thousands) of electronic law libraries available on the Internet today. Instead of forcing your users to go directly to the law library web site, you can configure MOSS to index the content and return results all in a single, holistic view.

I went to the Internet and found a free law resource library site to demonstrate the power of web site content source crawling. I configured the web site http://www.FindLaw.com then started a full crawl. Once complete, I performed a search in MOSS for 'LLC.' Clicking on either of the result links took me to the FindLaw web site containing a wealth of information regarding LLC's.

I recently had a customer, whom will remain unnamed, ask if there was a way to index various source code repositories. One of these repositories was a code library maintained internally and the others were all external.

In MOSS, this was quite simple to configure. Their internal source code library was maintained on a file share and added as a simple file share content source. There were a number of external web site source code libraries important to this customer so I helped them configure additional content sources for them then started the crawl process. Because we configured 5 web site sources, the crawl process took over an hour to complete. However, because of the type of information this is, incremental updates can be performs during off hours when server resources are highly available.

I took the time to duplicate a portion of the content sources in my MOSS Portal for the fictitious Litware Corporation. Once the crawl was complete, I was able to search my Portal for ‘AJAX.’

This search returned results from all three content sources; i.e. SourceForge.net, CodePlex and The JavaScript Source. This makes it quite simple for your users to search for information in a single location.

With the Internet resources available today, the possibilities for aggregating relevant information is only limited by your imagination. Since it is virtually impossible to centralize all information in an organization, we now have new technologies that present it to our users as though it is. This is one of the many benefits using MOSS as a Knowledge Management solution brings to the table.

Comments

Plain text