Making Search Engines Better

Last month, I discussed how Internet search engines work, explaining that they download almost all the Web pages on the Internet into a huge database, then examine hyperlinks from one Web page to another to determine how to rank pages on a particular topic. (See my commentary, "Needed: A Search Engine Tune-Up," InstantDoc ID 96682.) Thus, the Web page about anadromous fish that the most Web pages link to will end up at or near the top of the search results for "anadromous fish." (I'll spare you the googling: Anadromous fish hatch in fresh water, live their lives in salt water, and return to spawn in fresh water. Salmon are currently the best-known example. But 150 years ago, most Americans were well-acquainted with shad, a very large herring that provided a huge feast every spring as the fattened shad left the ocean for rivers.) This link-counting search worked fairly well in the old days, but as I noted last month, it seems to be falling apart now. To make it better, I suggest that identifiers be placed on blog pages and vendor content.

Searching on many technical Windows topics about Vista and Server 2008 almost invariably leads me to links to blogs. Some blogger mentions a software-related keyword--often a comment such as, "I can't make this work; it's junk," or "I was at a free Microsoft event, and I saw a demo of X,"--and this item of no value to someone trying to research the software makes it to the top of the search results. The "blogosphere" consists largely of systems that pretty much link every blog to every other blog, with the result that even the most quotidian blog ends up with tons of in-links and a high search rank. What to do? Perhaps a tag in the HTML saying, "This is a blog page," allowing me to specify "no blog returns" in my query. Don't misunderstand me--blogs are wonderful tools of social interaction. But so is conversation, and when I used to research something in a library, conversation was usually frowned upon. I'm just suggesting that we keep the chatter in the online conversation venues and the technical content elsewhere.

I'm a photography enthusiast, but it's difficult to query anything even vaguely related to photographic equipment without netting results that are online vendors offering "content" in the form of the vendor's two-paragraph glowing overview of the product. I love buying things on the Web, and I acquire most of my computing and shutterbug gear online. But when I'm looking up oranges in the library, I don't want someone to try to sell me a bushel of them as I retrieve a book on the topic. Again, why not some sort of identifier in the page saying, "This is a sales vendor's page," and the ability to block those pages on a Web search? It's not like people are going to avoid such pages, as evidenced by the continual growth of online sales. Furthermore, marking pages as vendor pages would offer the benefit that when I do want to find a good price on a product, I could confine my search to vendors, with no random non-sales pages included.

Ultimately the answer to how to keep the Web useful lies in the same solution often cited for keeping it safe: reputation-based systems. Search engines should rank Web pages on a topic not according to the number of links to it but to the reliability of the source. But until they do, check out my blog about my new video about how to get the most out of search engines . . .

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.