Search Google With ASP.NET

Use the Google Web Service and a proxy object to add Web-wide searchingfunctionality to your Web apps quickly and easily.

XML has been one of the biggest marketing buzzwords in the technology world for the past few years. With the introduction of Web Services, though, XML has some competition in the buzzword category. The question is, with all of the hype surrounding XML and Web Services, are they really that useful?

The answer to this question depends upon how you use the technologies to solve real business problems, such as exchanging data between distributed sources. In this article, you'll see how you can use XML and Web Services to integrate data provided by Google.com into ASP.NET applications.

 

The Google Web Service API

If you haven't been to Google.com before, the site provides an excellent application that includes the capability of searching through billions of Web documents and newsgroup posts. In an effort to make the data Google.com archives available for others to consume, Google has developed a Web Service and has provided a proxy object written in C# that can be used in .NET applications. The Web Services Description Language (WSDL) document used to create the proxy is located at http://api.google.com/search/GoogleSearch.wsdl.

If you haven't worked with Web Service proxy objects in .NET before, they act as the middlemen between your .NET application and a remote Web Service and can serialize and deserialize Simple Object Access Protocol (SOAP) messages for you automatically. FIGURE 1 shows an example of a typical Web Service architecture and identifies the proxy object.


FIGURE 1: A typical Web Service architecture can involve several different technologies, including WSDL; Universal Description, Discovery, and Integration (UDDI); the Web Service itself; and one or more proxy objects used to integrate with the Web Service and send or receive SOAP messages.

The proxy object Google provides allows you to tie into Google's Web Service through three methods, shown in FIGURE 2.

Method

Description

doGoogleSearch()

Search through indexed Web content.

doGetCachedPage()

Access pages Google caches during its Web crawling process.

doSpellingSuggestion()

Provide spelling suggestions for specific text.

FIGURE 2: The methods exposed by the Google Web Service allow you to search through the Google Webpage index, access complete Web pages by Google bots as the pages are parsed, and obtain spelling suggestions for search words.

Although you can create the proxy that is used to access these methods yourself using Visual Studio .NET or the WSDL.exe command-line utility, the proxy class (named GoogleSearchService.cs) provided by Google is ready to use right out of the box and already contains the additional helper classes used to access the Web Service. FIGURE 3 shows these additional classes.

[System.Xml.Serialization.SoapTypeAttribute("GoogleSearchResult", "urn:GoogleSearch")]

public class GoogleSearchResult {

  public bool documentFiltering;

  public string searchComments;

  public int estimatedTotalResultsCount;

  public bool estimateIsExact;

  public ResultElement[] resultElements;

  public string searchQuery;

  public int startIndex;

  public int endIndex;

  public string searchTips;

  public DirectoryCategory[] directoryCategories;

  public System.Double searchTime;

}

 

/// 

[System.Xml.Serialization.SoapTypeAttribute("ResultElement", "urn:GoogleSearch")]

public class ResultElement {

  public string summary;

  public string URL;

  public string snippet;

  public string title;

  public string cachedSize;

  public bool relatedInformationPresent;

  public string hostName;

  public DirectoryCategory directoryCategory;

  public string directoryTitle;

}

 

/// 

[System.Xml.Serialization.SoapTypeAttribute("DirectoryCategory", "urn:GoogleSearch")]

public class DirectoryCategory {

  public string fullViewableName;

  public string specialEncoding;

}

FIGURE 3: Google provides a C# proxy class that can be used to tie into its Web Service. The proxy contains several custom classes, such as the GoogleSearchResult, ResultElement, and DirectoryCategory classes shown here that can be used to interact with the Web Service.

The complete proxy object is available with this article's downloadable code (see the end of the article for details about downloading code). You also can download the proxy object from http://www.google.com/apis. To access it through Google, simply register with Google at the aforementioned URL. After registering, you will receive a unique key that must be used when calling the Web Service.

 

Search Google

Now that you've seen some of the functionality the Google Web Service provides, I'll explain how you can use the proxy object to perform a Web search. The first step is to compile the supplied proxy object using the csc.exe command-line compiler utility (alternatively, you can use Visual Studio .NET):

csc.exe /t:library /out:GoogleSearchService.dll

 /r:System.Web.dll GoogleSearchService.cs

(Note: you need to enter the above on a single command line.) Once the proxy is compiled into a .NET assembly, a new ASP.NET page can be created. First, you'll want to add a text box and button to the ASP.NET page so users can specify a keyword or phrase for which to search. When the button is clicked, your code needs to create an instance of the proxy class and call its doGoogleSearch method. This method accepts several different parameters. FIGURE 4 shows the key parameters.

Parameter

Description

key

Subscription key used to access the Google Web Service API. Visit http://www.google.com/apis/ to obtain a key.

q

Query text sent to the Google Web Service, which is used to search Web pages in the index. The end user will supply this parameter value.

maxResults

Number of results to return. The Web Service currently limits the maximum number to 10.

Start

Used to specify which record to start with when performing the search. By changing this number, paging functionality can be added.

FIGURE 4: The doGoogleSearch method accepts several different parameters that are used by the Web Service to search through the Google Webpage index.

The doGoogleSearch method returns a GoogleSearchResult object (the code for this object is in FIGURE 3). This object exposes a collection of ResultElement objects through its resultElements property, which can be bound directly to a Web server control, such as the DataList. Each ResultElement object has specific properties that allow you to obtain the URL for each item, title, directory category, and additional information. The code to call the Web Service's doGoogleSearch method and bind the resulting data is shown in FIGURE 5.

private void CallGoogleService(int record) {

    try {

        // Create a Google Search object

        GoogleSearchService s =

        new GoogleSearchService();

 

        // If you want to implement this service you

        // MUST get your own key from Google.

        // URL:http://www.google.com/apis/

        GoogleSearchResult r = s.doGoogleSearch(key,

            txtSearchText.Text, record, 10, false,"",

            false, "", "", "");

 

        //Make proper controls visible

        this.pnlResults.Visible = true;

 

        //Perform Data binding

        dlResults.DataSource = r.resultElements;

        dlResults.DataBind();

        this.lblTotalRecords.Text =

            r.estimatedTotalResultsCount.ToString();

    }

    catch (Exception exp) {

        this.lblError.Text = exp.Message;

    }

}

FIGURE 5: The doGoogleSearch method accepts several different parameters (see FIGURE 4) that are used to search the Google Webpage index. It returns a resultElements collection that can be bound to standard ASP.NET Web server controls, such as a DataList.

As the data binding takes place between each ResultElement and the DataList, you must cast the bound DataItem to a ResultElement, so you can access the appropriate properties. I have performed this cast using the standard <%# %> data-binding syntax, as shown in bold in FIGURE 6.



  

  

    
<%# ((ResultElement)Container.DataItem).title %>    ( Get Cached Page )

FIGURE 6: Binding the resultElements collection to a DataList control is accomplished by casting each data item in the collection that is being bound to a ResultElement object. Once the object is cast, its URL and title property can be accessed and bound to the HyperLink control.

By casting the DataItem to a ResultElement, you can access the URL and title properties and bind them directly to the DataList control's child HyperLink control. The output generated by calling the doGoogleSearch method is shown in FIGURE 7.


FIGURE 7: This image shows the output generated by calling the Google Web Service search functionality using the doGoogleSearch method. Paging functionality has been added so an end user can page through multiple records.

Although I won't discuss the paging techniques I used in the ASP.NET page to allow the DataList to page through the Google results, the downloadable code for this article contains all of the necessary programming logic to accomplish this task.

 

Access Google's Cached Pages

Google indexes different Web pages in its searchable collection by sending out Web crawlers to walk through pages and find specific keywords. As the crawlers do this, Google caches a snapshot of the page being indexed. These cached versions of pages also can be accessed through the Google Web Service as a byte array by calling the doGetCachedPage method. Then, the returned byte array can be converted to a character array by using the System.Text namespace, and the resulting character array can be converted to a string, which then can be written out. The code to accomplish this is shown in FIGURE 8.

public void dlResults_ItemCommand(Object sender,

  DataListCommandEventArgs e) {

  try {

    this.pnlResults.Visible = false;

    this.pnlCache.Visible = true;

    string url =

     ((HyperLink)e.Item.FindControl("hlLink")).NavigateUrl;

    System.Text.ASCIIEncoding enc =

      new System.Text.ASCIIEncoding();

    GoogleSearchService s = new GoogleSearchService();

    byte[] pageBytes = s.doGetCachedPage(key,url);

    char[] pageChars = enc.GetChars(pageBytes);

    this.lblCachedPage.Text = new String(pageChars);

  }

  catch (Exception exp) {

    this.lblCachedPage.Text = exp.Message;

  }

}

FIGURE 8: The ItemCommand event can be used to capture events raised by controls, such as a LinkButton that is nested within a DataList. When the LinkButton is clicked, the Web Service proxy object is instantiated, and the doGetCachedPage method is called. The returned byte array is converted to a string.

The code shown in FIGURE 8 is executed when the link button named lnkCachedURL within the DataList is clicked, causing the DataList control's ItemCommand event to be fired. The string that is created after converting the byte array is then written to a Label control named lblCachedPage in the ASP.NET page. FIGURE 9 shows an example of the output returned from calling the doGetCachedPage method.


FIGURE 9: Google caches pages through which it crawls and makes the data available through the doGetCachedPage() method. This figure shows a portion of a cached page.

 

Retrieve Spelling Suggestions

Aside from providing the ability to search the Google Webpage index and access cached pages, the Web Service also can make spelling suggestions for search keywords. This can be useful when an individual isn't exactly sure how to spell a particular word for which he or she wants to search.

The method within the proxy object that makes this possible is named doSpellingSuggestion and is easy to use. It takes two parameters: the Google key and the text for which you'd like spelling suggestions:

private void lnkSpelling_Click(object sender,

  System.EventArgs e) {

  try {

    GoogleSearchService s = new GoogleSearchService();

    string suggestion =

      s.doSpellingSuggestion(key,this.txtSearchText.Text);

      if (suggestion != String.Empty &&

          suggestion != null) {

        this.txtSearchText.Text = suggestion;

      }

  }

  catch {}

}

For example, a user could type the text "humin," and the spelling suggestion service would return "human."

Google certainly could use other methods to expose the functionality discussed in this article. But, by using XML and Web Services data, consumers are not required to write large amounts of code or even understand much about Web Services aside from how to use the proxy object and its associated methods.

Because Web Services are platform-neutral, a variety of consumers also can access the functionality Google exposes, with virtually any type of programming language. This allows the exchange of data between distributed applications to occur without resorting to manual data feeds or more complex technologies, such as Distributed Component Object Model, Common Object Request Broker Architecture, or Java Remote Method Invocation.

Fortunately for ASP.NET developers, Web Services are built directly into the .NET Framework so we all have a powerful mechanism for building and consuming Web Services. For a live example of consuming the Google Web Service APIs from an ASP.NET page, visit http://www.xmlforasp.net/codeSection.aspx?csID=56.

The files referenced in this article are available for download

 

Dan Wahlin is the president of Wahlin Consulting, and he founded the XML for ASP.NET Developers Web site (http://www.XMLforASP.NET), which focuses on using XML and Web services in Microsoft's .NET platform. He is also a corporate trainer and speaker, and he teaches XML and ASP.NET training courses around the United States. Dan co-authored Professional Windows DNA (Wrox) and ASP.NET Tips and Tricks (SAMS), and he authored XML for ASP.NET Developers (SAMS). Readers may reach Dan at mailto:[email protected].

 

Tell us what you think! Please send any comments about this article to [email protected]. Please include the article title and author.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish