Lock Out Spambots

Author's note: Lindy White passed away shortly before this article was posted. I'm grateful to Lindy and his supervisor, Kevin LaBranche, for bringing Lindy's solution to my attention and arranging the interview. --ag

Lindy White, Systems Specialist, Coconino County, Arizona

Businesses with public websites face the trade-off of providing unfettered access to legitimate site users versus blocking security threats to the site, such as hackers and bots. Local “.org” websites, such as governments and school districts, often publish employees’ contact information—but posting that information also makes the site a prime target for spambots that comb the Internet for email addresses to collect, or reap. Coconino County (Arizona) employees, whose contact information is published on department pages on the county website (www.coconino.az.gov), noticed a steep increase in spam early this year, despite the use of a spam-filtering product. County systems specialist Lindy White solved the problem by writing an ASP.NET 2.0 HTTP module that intercepts county email addresses being accessed from outside the county’s Microsoft IIS web server, then redirects legitimate users to a contact form. I spoke with Lindy about how he developed his innovative solution and how it has drastically reduced the spam in Coconino County employees’ mailboxes.

Q: Let’s start by talking about the county site and what made it a target for spambots.

A: On our public site, all our departments have a home page, and some have several additional pages. Department employees administer the content on those pages using a content management system (CMS). They’re very reliable and responsible about the kind of information that they’re publishing. Because we want our services to be reachable, \[the employees\] all make sure there are plenty of email addresses on these department pages.

Starting this year, we were filtering out roughly 400,000 emails a month, which isn’t atypical for an organization. But then we started seeing a straight-line increase in spam going up maybe 50,000 spam messages a month. I wondered whether our county website was contributing to this increasing load on our spam filter, with the number of email addresses we were exposing to web crawlers, web-bots, and spambots. You want Google and Yahoo! to crawl your site, but you don’t want the crawlers that are specifically there to reap email addresses.

Then I took off my white hat and put on my black hat. I wrote my own spambot, turned it loose against the county site, and came up with almost 600 unique county email addresses. That told me everything I needed to know. We needed to stop handing those \[email addresses\] out to spambots while still making those addresses available to people.

Q: How did you solve the spambot problem?

A: I proposed several solutions and pitched the best one to Kevin LaBranche, my division manager. Microsoft .NET Framework lets you write some very low-level hook-ins to the IIS web server. So I decided to write an HTTP module that sits in the web server’s memory and basically looks for email addresses that are leaving the web server to go to somebody’s computer. At that point, I chose to substitute a form with CAPTCHA, to enable my program to distinguish whether a person or a computer was accessing an email address. The email form hides the email address, but automated spammers can still fill out the form and submit it. The CAPTCHA test is a second level of security directed at preventing that. The module is all callbacks; it’s not linear programming at all, it’s all event driven.

When the HTTP module snags an email address, the module connects to a database and checks a list of email addresses maintained there. If that email address isn’t on the list, \[the module\] adds it and assigns it a unique number. If the email address is on the list, \[the module\] just reads that number and substitutes it for the email address, so that your random web-bot will never see it.

Q: How complex was the solution to develop?

A: Where the complexity came in was that the CMS editors needed to see the actual email addresses, not the contact ID of the form. I think I did what was probably pioneering work in how to selectively make exceptions for certain pages that you might classify as administrative pages and display email addresses to the employees who needed them.

Q: When you started using the HTTP module, what happened to the amount of spam employees were receiving?

A: I brought the solution online and put it in production in late February. In March, the number of spam caught in the filter was still going up in that same straight line, 50,000 a month. But in the March–April timeframe, we saw the first drop that we had ever seen. That curve dropped off by maybe 44,000 spam \[messages\].

Q: You’re primarily a system- and server-level scripter and programmer and don’t work with end users much. Nevertheless, you solved a big end-user problem. Did you get any recognition within your organization for your solution?

A: Yes, I was absolutely astonished to learn that I’d been nominated for a county award because of the solution. Nobody cares about the behind-the-scenes programming that I usually do. But whole departments were coming up to me and saying how they were so tired of all the spam they were getting on their public email addresses and thanking me for my hard work.

Comments

Plain text