Cybergroup Selects dtSearch

dtSearch’s Text Retrieval Engine Powers Web-based Business Intelligence Mining Library

asp:CaseStudy

 

Cybergroup Selects dtSearch

dtSearch s Text Retrieval Engine Powers Web-based Business Intelligence Mining Library

 

By Greg Bean

 

Cybergroup s client requested that Cybergroup develop a Web-based business intelligence mining library, including Web-based searching seamlessly combining both its structured SQL database and its separate document collection.

 

Project Requirements and Background

Cybergroup s client realized that database information, although critical to its business intelligence, represented only a small portion of all its corporate information. By the client s estimate, its corporate database contained a mere 20% of business-decision information, while the remaining 80% could be found in other sources Web site pages, Microsoft Office documents, PDFs, etc. The client needed a single search to cover both the SQL database and the file repository, as well as to return unified results from both sources.

 

To ensure that a search of the combined database and document repository retrieve all relevant information, the client further required not only basic search functionality, such as word and phrase searching, but also advanced search features. The client wanted search features like stemming and fuzziness for word misspellings, as well as phonic searching. The client also wanted concept searching, including the capability for synonym expansion using both pre-defined thesaurus terms and a user-defined thesaurus/synonym list.

 

For sorting search results, the client wanted a variety of advanced relevancy ranking options. Finally, for ease of browsing search results, the client specified that the search must return retrieved SQL database entries and documents with highlighted hits (as well as a preferably WYSWYG display of Web pages like HTML, PDF, and XML, along with the highlighted hits).

 

Going forward in terms of digital library management, the client needed Cybergroup to develop a solution allowing multiple contributors to be able to upload documents to the Web library. Upon document check-in, the client further needed a mechanism to add to the client s main SQL database metadata regarding the document.

 

Solution Overview

To meet all of the above requirements for the project s search functionality, Cybergroup chose the dtSearch Text Retrieval Engine for Win & .NET by dtSearch Corp.(http://www.dtsearch.com). A single dtSearch index could include both the SQL database and the separate document repository, including searching with all the above advanced search features, ranking capabilities, and hit-highlighted display options.

 

To use these built-in capabilities, Cybergroup needed to write custom VB.NET code to drag along certain fields from the database that would be associated with each document and stored in the searchable index. Cybergroup also needed to write a custom ASP.NET-based server control using the dtSearch Engine APIs. Cybergroup called this application its dtResults Control ; screenshots and a detailed description of Cybergroup s dtResults Control follow.

 


Figure 1: Cybergroup s dtResults Control.

 


Figure 2: Cybergroup s dtResults Control.

 

Cybergroup s Description of dtResults Control

Like any .NET control, a developer can drag and drop the dtResults Control right into a development environment. Cybergroup implemented the dtResults Control by inheriting from the datagrid control, leveraging the existing power of the datagrid. Cybergroup chose the datagrid as a foundation for its server control because it offers built-in paging and a robust programming model.

 

The following code is from Cybergroup s sample application, and appears when the user enters a search term or phrase and clicks the Search button:

 

Private Sub GetResults()

 'Setting the location of the index

 SearchResultList1.IndexPath = "c:\dbconnectorindex"

 'Mapping virtual path of documents to physical path

   Dim rptd As New SearchResultList.SearchResultList.ResultPathTranslationDictionary

 rptd.Add("c:\testdocs", "./testdocs")

 'Setting various search settings

 SearchResultList1.RelativePathTranslations = rptd

 SearchResultList1.SortCaseInsensitive = cbCaseInsensitive.Checked

 SearchResultList1.SortAscending() = ddAscendingFlag.SelectedValue

 SearchResultList1.SearchType = ddSearchType.SelectedValue

 SearchResultList1.SortType = ddSort.SelectedValue

 SearchResultList1.Stemming = cbStemming.Checked

 If cbFuzzyness.Checked = True Then

     SearchResultList1.Fuzzy = True

     SearchResultList1.FuzzLevel = ddFuzzyness.SelectedValue

 Else

     SearchResultList1.Fuzzy = False

 End If

 SearchResultList1.Phonic = cbPhonic.Checked

 SearchResultList1.Synonyms = cbSynonyms.Checked

 'Defining dtSearch custom fields to be displayed

 Dim cfn As String() = {"SupplierID", "CompanyName", "Region"}

 SearchResultList1.CustomFieldNames = cfn

 Dim cffn As String() = {"Supplier ID #", "Company Name"}

 SearchResultList1.CustomFieldFriendlyNames = cffn

 If chkSearchWithin.Checked = True Then

     SearchResultList1.SearchWithin = True

     SearchResultList1.PreviousSearchFilter = Session("psf")

 End If

 'Executing the search and binding the results

 SearchResultList1.GetResults(tbSearch.Text)

 'Storing the "previous search filter" to be used later if user clicks "Search Within Results"

 Session("psf") = SearchResultList1.PreviousSearchFilter

 Literal1.Text = "Search: " & tbSearch.Text & " returned: " & CType(SearchResultList1.DataSource, DataTable).Rows.Count & " results"

End Sub

 

The following provides a flavor of the development and functionality behind Cybergroup s development of the dtResults Control.

 

Using the GetResults method of the dtResults Control, Cybergroup reduced the task of creating the search and results display to one line of code in the simplest case. We can execute a search and then display results by passing a search string input by the user on the search form, as in this example:

 

SearchResultList1.GetResults(tbSearch.Text) 'ONLY ONE LINE OF CODE

 

Of course, a developer can also leverage the power of the dtResults Control though its properties. Take for example the SortType property. Simply put, the SortType property allows the developer to sort the information in results display. Let s say the developer wants to have the most recently modified documents appear first in the results display. The developer would set the SortType property to date and the Ascending property to false ; for example:

 

SearchResultList1.SortType = "date"

SearchResultList1.Ascending = false

 

On the internal side of the control, a canned set of strings like date , hits , and title are checked, and the Ascending variable is checked. Then the control produces a hex variable containing dtSearch flags encoded in a certain way to be passed to its sort function. However, the binary manipulations are abstracted, and the developer can even bind the variables, by single lines of code, to checkboxes or dropdown lists.

 

Here s the code in the dtResults Control for the SortType property:

 

Dim flags As New dtengine.SortType

 If Not (sortf = 0) Then

   flags = sortf

 ElseIf sortt = "hits" Then

   flags = dtengine.SortType.stSortByHits

 ElseIf sortt = "index" Then

   flags = dtengine.SortType.stSortByIndex

 ElseIf sortt = "date" Then

   flags = dtengine.SortType.stSortByDate

 ElseIf sortt = "timeofday" Then

    flags = dtengine.SortType.stSortByTime

 ElseIf sortt = "title" Then

   flags = dtengine.SortType.stSortByTitle

 ElseIf sortt = "name" Then

   flags = dtengine.SortType.stSortByName

 ElseIf sortt = "filetype" Then

   flags = dtengine.SortType.stSortByType

 ElseIf sortt = "size" Then

   flags = dtengine.SortType.stSortBySize

 Else

    flags = dtengine.SortType.stSortByUserField

 End If

 If sascend Then

   flags += dtengine.SortType.stSortAscending

 End If

 If cinsens Then

   flags += dtengine.SortType.stSortCaseInsensitive

 End If

 res.Sort(flags, sortt)

 

Critically important to our project is the ability to extract custom field data from the dtSearch index. Custom fields are columns that we have extracted from the database during the indexing process and now wish to present in a search results display.

 

Through the use of the CustomFieldNames and the CustomFieldFriendlyNames properties, a developer can easily and attractively display database information in the results display.

 

The CustomFieldNames property is a string array of the names of custom fields (i.e., database columns) in the index that the developer wishes to include in the results. When defined, the strings in it should appear exactly as they do in the index. For example, { SupplierID , CompanyName , Region }.

 

The CustomFieldFriendlyNames property is a string array that represents the names of the fields that the developer would like to have appear in the control. This provides for a high degree of customization in results presentation. Rather than display cryptic database column names, the developer can display understandable labels. These names are connected to actual custom fields by their position in the array, with regard to the CustomFieldNames property above. If the string is longer than CustomFieldNames, then the end is discarded. If shorter, then the names of the remaining custom fields default to their actual names. For example, { ID # of Supplier , Supplier Name , Supplier s Region }.

 

To return the Custom Field information in the results display the developer would simply set the properties as in the following example:

 

Dim cfn As String() = {"SupplierID", "CompanyName", "Region"}

SearchResultList1.CustomFieldNames = cfn

Dim cffn As String() = {"Supplier ID #", "Company Name"}

SearchResultList1.CustomFieldFriendlyNames = cffn

 

Following is a complete list of the dtResults Control properties and methods:

 

Ascending: If true, the results will be sorted in ascending order by whatever criterion is specified in SortType. If false, results are sorted in descending order. Defaults to false.

 

CustomFieldNames: This string array represents the names of custom fields in the index that the developer chooses to include in the results. The strings in it should appear exactly as they do in the index; for example, { SupplierID , CompanyName , Region }.

 

CustomFieldFriendlyNames: This string array represents the names of the fields that the developer wants to appear in the control. These names are connected to actual custom fields by their position in the array, with regard to CustomFieldNames. If longer than CustomFieldNames, then the end is discarded. If shorter, then the names of the remaining custom fields default to their actual names. For example, { ID # of Supplier , Supplier Name , Supplier s Region }.

 

Fuzzy and FuzzLevel: These control the tolerance of the search; for example, searching for alphabet with Fuzzy = True and FuzzLevel = 1 would also search for alphaqet or albhabet . Searching for alphabet with Fuzzy on and FuzzLevel at 3 would also find alpkaqet .

 

IndexPath: This is the location of the dtSearch index files to use for searching. If it is not set, then SearchResults will look for an IndexPath key in Web.config.

 

Phonic: Controls phonic searching; for example, with Phonic = True, searching for Smith would also find Smythe .

 

PreviousSearchFilter: This allows the developer to create Search Within Results functionality, in conjunction with the SearchWithin property, described below. This property should be saved to a session variable after the initial search, and restored from it when the user triggers a Search Within Results .

 

RelativePathTranslations: A SearchResultList.ResultPathTranslationDictionary containing the relative paths of the absolute paths to documents stored in the dtSearch index. This allows a URL to be generated for the link to the document, given only an absolute path on the server. For example, one might include the following in an initialization method:

 

Dim rptd As New SearchResultList.SearchResultList.ResultPathTranslationDictionary

rptd.Add( c:/Inetpub/website/search/documents , documents )

rptd.Add( c:/Inetpub/website/tutorials , ../tutorials )

SearchResultList1.RelativePathTranslations = rptd

 

SearchType: A string. Valid values are allwords , anywords , phrase , and boolean . In the allwords setting, dtSearch will search for any document containing each word in the search, in any order or proximity. In the anywords setting, dtSearch will search for any documents containing any of the words in the search query, not necessarily all of them in the same document. In the phrase setting, dtSearch will consider the entire search query like a single word, and search for documents containing the exact query. In the boolean setting, the user can use Boolean logic to specify a query. dtSearch provides the following guidance:

  • tart apple pie - the entire phrase must be present
  • apple pie and pear tart - both phrases must be present
  • apple pie or pear tart - either phrase must be present
  • apple pie and not pear tart - only apple must be present
  • apple w/5 pear - apple must occur within 5 words of pear
  • apple not w/27 pear - apple must not occur within 27 words of pear
  • subject contains apple pie - finds apple pie in a subject field
  • use parenthesis if the query contains more than one connector

 

SearchWithin: If this property is set to True, and the PreviousSearchFilter property is set to a value obtained from it after a previous GetResults call, then the results of the current search will be a subset of the results of the previous search.

 

SortType: A string. Meaningful values are hits , date , name , and size . If set to hits , the documents containing the most occurrences of the search query, or the highest score, will appear on top. If set to date , the most recently modified documents will appear on top. If set to name , the documents will be sorted in alphabetical order of their title. If set to size , the documents with the largest file sizes will appear on top. If the field has a different value than any of these, it is assumed to be the name of a custom field in the index by which to sort.

 

Stemming: Controls the word stemming capability of dtSearch. For example, if Stemming = True, searches for apply , applying , applier , or applies are all equivalent.

 

Synonyms: Uses an English thesaurus to search for synonyms of the search query in addition to the search query itself.

 

GetResults(SearchText As String): Simply put, this function evaluates a search with the arguments determined by properties on the query string passed, and displays the results in a human-readable format, with 10 results per page and a pager control. Until this method is called, the control is invisible to the user.

 

Greg Bean is President of Cybergroup, Inc., a developer of advanced Internet and intranet developer search tools in Baltimore, MD. E-mail him at mailto:[email protected].

 

dtSearch

dtSearch offers over a decade of experience in text search and retrieval. Large enterprises typically use dtSearch products for general information retrieval, Internet and Intranet site searching, access to technical documentation, and embedding in applications for distribution. dtSearch is also on the US Government s GSA Schedule. The company has distributors worldwide, including coverage on six continents. For more information visit http://www.dtsearch.com.

 

 

 

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish