LANGUAGES: VB.NET | C#
ASP.NET VERSIONS: 2.x
Prepare to Be Searched
Get Your Site Noticed by the People Who Matter Most
By Steve C. Orr
If your Web site provides useful content, services, or products, there are people out there who want to know about it. But how do you get the word out? You could send out copious amounts of spam to get noticed, but that s not likely to earn the kind of reputation that most organizations crave. Other forms of marketing and advertising are likely to bring more positive results, but just because you don t have an advertising budget doesn t mean you re out of luck. Read on to find out free ways to maximize your Web site s status and get found by the people you re trying to reach.
Robots and Crawling Spiders
Sounds like an introduction to sci-fi movie, doesn t it? Actually, robots, crawlers, and spiders are all names for custom software from search engines like Google, Yahoo, and MSN Search that investigate what s currently out on the Internet. If you have a public Web site, chances are it has already been visited, scanned, and thoroughly indexed by one of these ominous-sounding pieces of software. As intimidating as they sound, spiders can be your best friend if you take the time to understand them they hold the key to every Web site s search ranking. If your site sells discount toothpicks, then your site needs to appear near the top of the list when users search for discount toothpicks and the spiders hold the power to make that happen.
Functionally speaking, spiders do little more than record key pieces of your Web page s HTML and follow the hyperlinks to see where they lead. Conceptually, it s not very difficult to design a basic spider yourself. The .NET WebRequest object is all you really need to retrieve the HTML of a page so you can parse it and extract the hyperlinks to recursively parse other related Web pages. While in the process, you can store important pieces of text in a database for querying. Sites like Google and Yahoo have become masters of this technique, and by understanding some details about how they do it, you can use their global dominance to advance your own agenda.
A primary technique that spiders employ is to examine the words used most often in your Web pages. Therefore, the text content of your Web site is important for determining the ranking of your site in relation to specific words and phrases. It s not very feasible (or advisable) to make major changes to the content of a Web site just to increase search rankings. Instead, there are other techniques that are likely to give better results. For example, another extremely important item search engines examine is the title of a page. In a basic HTML page, the title would be defined like this:
When the page is viewed by a user, its title shows up in the title bar of the browser, as shown in Figure 1. As far as search engines are concerned, it is best to have the title consist of a good sentence or two filled with highly descriptive words about the page and/or Web site. This will help search engines understand the primary focus of the Web page, thereby increasing the site s ranking when people search for related topics.
Figure 1: A Web page s title shows up in the title bar of the user s browser. It s a key element that is examined by most major search engines to determine the subject matter of a Web page.
In ASP.NET 2.0, you re likely to have a master page, so the simplest way to specify the title for each page will be more like this:
<%@ Page TITLE="MY PAGE TITLE" Language="VB"
ContentPlaceHolderID="CPH1" Runat="Server"> Hello World
This technique is fine for a small Web site, but for larger sites you re in for a major maintenance chore if you ever decide to change the titles of all the pages in your Web site. Luckily, ASP.NET 2.0 makes it easy to change a page s title programmatically from the page s (or master page s) code-behind file:
Page.Title = "Discount Toothpicks" 'VB 2005
Page.Title = "Discount Toothpicks"; //C# 2.0
Now all that s needed is a way to programmatically set the page title from some kind of a data source. Luckily, the SiteMapDataSource is perfect for this kind of thing. For more information about site maps, I suggest you read Automate Navigation Chores. Once a site map is set up, it only takes a tidbit of code in the master page s code-behind to set the page title to the associated title specified in the site map:
If SiteMap.CurrentNode IsNot Nothing Then
Page.Title = SiteMap.CurrentNode.Title
if (SiteMap.CurrentNode != null)
Page.Title = SiteMap.CurrentNode.Title;
Descriptions, Keywords, and Meta Tags
Virtually all search engines make use of the page title, so it has a high payoff to ensure each page is thoroughly titled. However, there are other specific HTML elements that some search engines also value highly in their rankings. For example, Yahoo and MSN Search use the Description meta tag when present; Yahoo uses the Keyword meta tag, as well. Here s a syntactically correct example of these meta tags in action:
content="Discount Toothpicks" id="description" />
content="toothpicks, discount, teeth, cheap" />
Get yer cheap toopicks here
Technically, from an HTML perspective, the runat and id attributes are not required but by including them it permits you to adjust their value via server-side code. For example, you can use a SiteMap for the Description meta tag in a similar way that the title page was set in the previous example:
If SiteMap.CurrentNode IsNot Nothing Then
Me.description.Content = SiteMap.CurrentNode.Description
Me.keywords.Content = _
if (SiteMap.CurrentNode != null)
this.description.Content = SiteMap.CurrentNode.Description;
While SiteMaps don t officially support the keyword attribute, you can add it anyway because extraneous attributes are permitted and can be accessed programmatically using the syntax listed above.
Get a Buzz
Another extremely important factor that search engines consider when ranking a site is how many other Web pages on the Internet link to that site. For a Web site to be considered an authority on a particular topic, it will need a lot of related Web sites pointing to your site, and the effect is greatest when those sites rank highly (see Figure 2). Of course, the rhetorical question here is how to get other sites to link to yours. There is no single great answer to this although it sure helps if you ve got a lot of advertising dollars to spend. Otherwise, you re stuck with gradually building a reputation and getting other sites to link to yours via trading, begging, bartering, and hard work. Sometimes sharing content with other Web sites is a good way to get them to notice you and (more importantly) provide valuable hyperlinks back to your site.
Figure 2: The Google toolbar plug-in (available for Internet Explorer and Firefox) gives a good indication of a particular Web site s ranking. This ranking is based primarily on how many other Web sites link to the site.
Creating a buzz is a great way to launch a public site on the right foot. Get the word out. Make sure all the sites that should know about your pages are aware your site is online. Post in public forums frequently, and always include a hyperlink to the site in your signature or elsewhere in the posting. Get friends and coworkers to join in, too. If you re proud of your site, make a big deal about it and see who notices.
Through some investigation, you might find some link networks related to your industry. Basically, when you join such a network you agree to provide links to other related Web sites, and they agree to link to yours as well. Varying degrees of automation are generally involved to ensure participation among members. If you go with this approach, be sure to stay with link networks within your industry; straying into more general link farm networks will often have the opposite effect; that is, watering down the focus of your Web site in the eyes of search engines, potentially making it more difficult to find.
When you feel your site is ready, most major search engines provide a way to submit a site for indexing, which effectively queues the site visit from a spider. To submit a site to a search engine, visit the main search page and find a help link and click it to find their submittal page. It s generally not necessary to submit a site to the search engine because their spiders will eventually find it on their own, although it can sometimes speed up the process. In fact, Google s spiders are so effective that Google doesn t even provide a way to manually submit a site. Don t worry if your site has already been indexed; spiders will visit again soon to investigate content revisions.
What Not To Do
While all the previous tips provide valuable things that can be done to improve a site s search ranking, there are also some things that simply should not be done. For example, most spiders are unable to analyze images, so you shouldn t hide critical search phrases inside an image unless they are duplicated in the image s ALT attribute.
It s also advisable to not attempt to trick search engines to increase a site s ranking. People have come up with all kinds of devious ways to hide extra key words in HTML documents in an effort to boost profiles. Some people mistakenly think injecting a wide variety of irrelevant words in a Web site will help it to be found by a wider audience. My advice is to not get cute like this. The major search engines have seen it all before. At best, these extra words will be ignored; at worst, your entire site could end up being ignored.
Generally speaking, the more Web sites that link to your site the better. However, there are a couple exceptions. Web sites infamous for undesirable content such as spam, warez, and other illegal activities might give your Web site a bad reputation in the eyes of search engines if they consistently link to your site. In other words, keep your nose clean so questionable sites will have little interest in linking to your content.
Complex QueryStrings can also confuse spiders. For example, do these two URLs output the same content?
The answer is, it depends. As a Web developer, you likely know that the ID QueryString tacked onto the end of the URL could be mostly irrelevant, or it could completely change the page that is displayed. Spiders understandably tend to get confused by this kind of thing and don t know whether to index them as separate pages. As a result, some spiders completely ignore such pages. Because complex QueryStrings confuse spiders, they should be mostly avoided, especially for pages that are meant to be highly searchable. The Context.RewritePath method can be quite useful for providing spider-friendly URLs without having to heavily modify a preexisting architecture that relies on QueryStrings.
Perhaps there are parts of a Web site that should not be searched. Maybe they contain personal information or sensitive copyrighted content. The best solution is to use some kind of authentication, such as Forms Authentication or Windows Authentication. Because spiders don t have user accounts, they won t be able to access (or index) the information contained within. However, if a full-blown authentication system is overkill for your needs, there are some simple alternatives to keep specific pages away from prying spider eyes.
One solution is the ROBOTS meta tag. To prevent a page s content from being indexed, add the following meta tag to its HTML:
To prevent spiders from following hyperlinks contained within the page, add this meta tag to the page s HTML:
While this solution can be useful for protecting a page or two, it can start to become less manageable for larger numbers of pages. If entire directory trees need to be protected, then creating a robots.txt file in the web root may be a better solution because it centralizes the management of such details. To prevent the entire Web site from being indexed, the robots.txt file should contain the following text:
This tells all (*) spiders to ignore pages starting at the root (/) of the Web site. It s easy to be more selective about which files to exclude, such as in the following example that denies (only) Google permission to index content in the web root s subdirectory named secure , as well as the /data/logs subdirectory:
It s also possible to grant different levels of access to spiders from different search engines, and other advanced tricks that are beyond the scope of this article. For more information, see http://www.robotstxt.org/wc/faq.html.
Although there is currently no ratified standard that is guaranteed to ward off all search engines, most voluntarily comply with the techniques mentioned here.
Search Is King
Being easily found on the Internet is an important accomplishment for any public organization. Being able to find information can be just as important. For more details on how to retrieve and use search results programmatically, see Search Box.
Obviously, the topic of searching and indexing the Web is far more complex than anyone could hope to cover in an article or two; otherwise, companies like Google and Yahoo wouldn t be able to rake in such enormous amounts of money from their expertise. Armed with the right knowledge, and building on the information you now have, maybe you too can scoot up to the table and grab yourself a piece of the pie.
Steve C. Orr is an MCSD and a Microsoft MVP in ASP.NET. He s been developing software solutions for leading companies in the Seattle area for more than a decade. When he s not busy designing software systems or writing about them, he can often be found loitering at local user groups and habitually lurking in the ASP.NET newsgroup. Find out more about him at http://SteveOrr.net or e-mail him at mailto:[email protected].