Last month, I went over some of the basics of Microsoft Index Server. I showed you how the default Web catalog is created, what it does, and how you can add and remove directories from its scope. The Microsoft Management Console (MMC) Index Server Services (ISS) snap-in lets you add directories to existing catalogs and create new catalogs. Index Server also has a set of HTML administration pages that give you a wide range of virtual root information and index statistics on the default Web catalog. Also, several example search pages exist that range from basic to complex Microsoft SQL Server ad hoc query builders that you can use as templates for building search pages.
This month, I build on the information in last month's article. I show you how to
- Build a new custom catalog for a Web domain of your choice.
- Add directories to the catalog that are outside that Web domain and exclude directories that are inside the Web domain.
- Modify one of Index Server's example search pages, Query.asp, to search the domain using your new catalog.
Building a Web Catalog
Before you begin building a Web catalog, you must first open the IIS and Index Server snap-ins. You can open the IIS snap-in in MMC by choosing Start, Programs, Windows NT 4.0 Option Pack, Microsoft Internet Information Server, Internet Service Manager. To open the Index Server snap-in, choose Start, Programs, Windows NT 4.0 Option Pack, Microsoft Index Server, Index Server Manager. The Index Server snap-in opens in MMC. Next, choose Stop from the Action menu to stop Index Server. (If you've upgraded to Index Server 2.0 and MMC 1.1, you administer Index Server from its instance in MMC, as I do in this article.) Index Server's short name in the system is Content Index (CI), so the snap-in is called Ciadmin. You'll see references to CI in your event logs every time Index Server updates an index.
To begin catalog building, right-click Index Server on Local Machine and select New, Catalog, as Figure 1 shows. The Add Catalog dialog box, which Figure 2 shows, appears. Enter a name for your new catalog, and select a directory location in which to store the catalog. This directory is important because the CI internal parameters use the directory, not the catalog name, for searches. When you enter the information and click OK, you see a message that the catalog will remain offline until you restart Index Server. After you restart Index Server, the catalog entry appears in Ciadmin. The problem is that by default, the catalog is pointing at (tracking) the default Web site on the server machine, just like the Web catalog.
To change the Web domain that the catalog is tracking, right-click the new catalog and select Properties. Choose the Web tab, which Figure 3 shows. By default, the Track Virtual Roots check box, and the default Web site are selected. Click the drop-down list box, and select the virtual root of your choice. The catalog Properties dialog box contains two other tabs—Location and Generation. When you choose the Location tab, you can't change the location of the catalog. You must delete the catalog and recreate it if you want to change its location.
The Generation tab, which Figure 4 shows, lets you specify whether you want to filter files with unknown extensions. If you select the Filter Files with unknown extensions check box, the Indexer ignores files that have extensions you haven't specified. If this check box is clear, the Indexer attempts to index every file it finds in every directory in the scope. The other option on this tab is Generate characterizations. Characterizations (also called abstracts) are the bit of text (of maximum size) that appears under the document title. Index Server draws this text from different places depending on the document filter you're using. For example, in an .html document, the filter populates the characterization from the Description metatag. If no description metatag exists, then the results can be unpredictable depending on the actual HTML in the document. In general, if the document property or HTML element doesn't exist, the Indexer simply takes the maximum number of characters from the beginning of the document. Similarly, Index Server takes the title from the <title> element in the .html document or the Title property in a Microsoft Word document.
Excluding and Adding Directories
To refine your catalog, you can exclude or add directories. You can exclude subdirectories and virtual directories from the catalog's scope in a couple of ways. The first way is through the IIS snap-in in MMC; the second way is through the Ciadmin snap-in.
Excluding a directory through the IIS snap-in. To exclude a directory through the IIS snap-in, select a directory under the virtual root for which you built the catalog but that you don't want indexed (e.g., Search). Right-click the directory, and select Properties. From the Directory tab of the Properties dialog box, clear the Index this directory check box. This method is effective, but it's sometimes tedious to test because you have to view the properties for each subdirectory to see whether it's excluded.
Excluding a directory through the Ciadmin snap-in. Click the plus sign (+) next to your new catalog name, then right-click the Directories folder. Select Add Directory to bring up the Add Directory dialog box, which Figure 5 shows.
In this example, I've added three directories to the catalog. The directory that Figure 5 shows is unique because it's an excluded directory. (Notice that I've selected the Exclude option under Type.) By default, Index Server would index the d:\Ideva\cybercash subdirectory as part of the Ideva domain, but Exclude tells CI not to index this subdirectory. As a result, I can come to one screen—the Ciadmin console, which Figure 6 shows—and view the entire scope. The other two directories in Figure 6 (i.e., g:\testers paradise and d:\TechArticles) are directories that are outside this Web domain. TechArticles is on the same drive in the server, but it isn't a subdirectory or virtual directory of the Ideva domain. Testersparadise, a Web domain on a different server, is identified by its Alias (Uniform Naming Convention—UNC) name. Alias (UNC), which is optional, is the server name and path for this directory (e.g., \\hudson\dtestersparadise). What you enter in the Alias field is returned to the client who is executing a query and gets a hit from this directory. Alias (UNC) is useful in an intranet setting in which a user needs direct access to this directory. On the Internet, I consider this information private and don't include it. Don't worry: Users can't see documents or even references to documents that they don't have permission to see. (I'll talk more about security at the end of the article.)
When you're finished adding directories, you must stop and restart Index Server so that it can update all its indexes. After the restart, close the Ciadmin MMC console and reopen it to force it to refresh its directory listings. When everything is updated, you'll see your new catalog and the directories. The Ideva Web domain shows up as the folder with the globe in Figure 6.
Inheritance overrides. Here's something to note: In the IIS snap-in in MMC, I selected the Index this directory check box on the Home Directory tab of the Properties dialog box for the Ideva domain. If I hadn't selected this check box in the IIS snap-in, the Ideva site wouldn't show up in the directories list in the ideva_index catalog—only the directories that I added would appear. See "The Basics of Index Server," July 2000 for information about this setting and about the Inheritance Overrides dialog box, which Figure 7 shows, that opens in the IIS snap-in when you change this setting. Inheritance Overrides gives you the opportunity to change the default Excluded status on any specially marked directories in the domain, such as the Microsoft FrontPage private directory _vti_bin.
Index Server includes some nice sample search pages. You'll find several versions of these sample pages in the various sample sites that Microsoft has shipped (e.g., EXAir). The easiest way to find them is to look in the default Web site in the \iissamples\
isssamples folder. The sample pages can be hard to understand because they rely on a server having only one Web domain—the default Web site—and that is all you can search with them. Virtually no documentation exists about how to set up one of the fancy search pages to use a specific catalog. However, I'll tell you what I found out.
In the beginning (Index Server 1.x—when Microsoft wrote the NT 4.0 Option Pack documentation), each server machine had only one Web domain on it. As a result, the default Web catalog and the Registry default values worked perfectly. But how do you tell Index Server to look at a different catalog? The magic missing parameter is called CICatalog. (See the sidebar, "Additional Resources," page 12 for more information about CI parameters and Registry settings.)
When Microsoft wrote the sample pages, Active Server Pages (ASP) was in its infancy, all script was VBScript, and people used primarily Internet Server API (ISAPI) to interact with the IIS server. At that time, using Internet Data Query (IDQ) files to send queries to the Index Server was the preferred method for searching a catalog. Next, you used an HTML Extension (.htx) file to format the output back to the user. Of course, Query.asp has no reference to the CICatalog parameter. Because everyone uses ASP these days, I decided to make the Query.asp example work in my domain with my new catalog.
As I mentioned previously, CICatalog doesn't specify the name you gave the catalog—it specifies the directory in which you put the catalog. Look again at Figure 2: You can't move the directory that you name in the Location field of the Add Catalog dialog box later; this behavior becomes important if you do a lot of custom work adding directories and exclusions to the catalog then need to move the site. Also, another important parameter, CIScope, seemed at first glance to let you specify a catalog on the fly just by naming directories. (The Microsoft Indexing Services 3.0 documentation that I talk about in a moment clarifies this confusion.) In reality, CIScope lets you specify directories and subdirectories and how deeply to search what is already in the catalog.
So how do I figure out the magic ASP parameter name to specify to search the catalog of my choice? When I looked at the Query.asp file, I immediately noticed the code in Listing 1. My first instinct was simply to add one line:
Adding this assignment statement worked, as the results in Figure 8 show. However, I wanted to know more about the process.
A search for ixsso.Query at http:// www.microsoft.com/ took me to "Using Indexing Service APIs from Scripts" (http://www.msdn.microsoft.com/ library/psdk/indexsrv/ixufilap_6ulv.htm). These pages are part of the Indexing Services 3.0 documentation, but everything I tried seemed to work perfectly for Index Server 2.0. I must mention, though, that the test machine I used has all the required service packs and ActiveX Data Objects (ADO) upgrades that Microsoft Site Server 3.0 requires. You can find the required upgrades and service packs for Site Server 3.0 at http://support/microsoft .com/support/siteserver/install/install_ ss3.asp. You can find the parameters and methods for ixsso.Query at http:// www.msdn.microsoft.com/library/psdk/ indexsrv/ixrefobj_7kah.htm.
The easiest way to get search pages to your catalog's domain is to copy Query.asp and its attendant support files in the Samples and Oop subdirectories to a directory in your domain (e.g., d:\ideva\search). Depending on how many graphics and included footers you want to keep or lose, you can selectively remove the files you don't need. Figure 9 shows the files I copied. You need qfullhit.htw and qsumrhit.htw. They handle the full and summary highlighted search items. However, the other files you choose depends on how you modify Query.asp. I put the files in a subdirectory to keep them from leaking into the main development Web site and the index. Then, I just cleared the Index this directory check box in the IIS Properties dialog box for that directory. You can follow the examples in the Option Pack Index Server documentation's Getting Started section to create the look and feel of this page. Or you can dump the page into a FrontPage Web site the way I did in this example and just apply the Web theme of your choice.
To point Query.asp at your new catalog, open the file in your favorite ASP editor. Search for ixsso.Query, and add
Q.Catalog="the path to your catalog"
as Listing 1 shows.
A Few Words About Security
The Security section of the ISS documentation in the Option Pack says a user who doesn't have permissions to the file will never even know it's there. However, my test results didn't quite match the documentation. I found that when I locked a subweb in a domain so that only a few users or a group could use it, Index Server didn't return any hits from that subweb, no matter whom I logged on as. Even if I had used the same instance of the browser to log on to the protected subweb, I couldn't see any results from that site.
Problems with Housekeeping
Tim Huckaby, "Implementing Site Server Search Database Catalogs," July 2000, discusses using Site Server 3.0 to set up a catalog and search page. Index Server requires a bit more hands-on work than Site Server Search. But for those situations in which you don't have access to Site Server Search and you need more than or can't use the FrontPage wide-area Information Server (WAIS) Search, Index Server can do the job for you.
The biggest problem with adding a catalog and search page as I've done in this article is the cleanup that someone (probably the content author) has to do on the files on the virtual server and included directories. For example, the title on the second document in Figure 8 is missing.
I don't recommend using this method if your Web developers keep a lot of stuff lying around in their Web directories. If they do, users might see abstracts filled with strange characters and snatches of programming script, or worse. When I first tried this catalog, a search for ideva returned 120 hits: Only about 40 of them were valid. A major contributor to the bad hits were the many legacy private FrontPage files sitting in the virtual root. I had to identify, remove, move, or exclude them all.
Excluding entire directories doesn't solve all your presentation problems in the search results, either. You can use filters to exclude certain types of files and individual files from being searched. You can also use the CIRestriction property to cause the search results to not display certain types of hits. (See "Additional Resources" for information about this topic.)
The bottom line is, not all my customers are willing to pay for Site Server, and Index Server is free with the Option Pack. Also, the search page Query.asp works on any virtual server in IIS, provided you've created a catalog for it. So, although it requires more setup, Index Server and a good set of search pages can produce rich search results pages that aren't limited to .html documents.
Next month, I'll show you how to use Index Server catalogs with FrontPage 2000 directly. The big advantage here is that you don't need to code ASP Search pages: FrontPage 2000 does that for you. Your Web authors can use the FrontPage Search components in the FrontPage client to create Search pages, eliminating the need to write or tweak the search form files. And FrontPage handles some file restrictions for you, cutting down on the number of inappropriate files returned when users search.