2003: The Year of Spam

Apple Computer CEO Steve Jobs recently described 2003 as "the year of the notebook" because he sees notebook sales starting to eclipse sales of desktop PCs. Various open-source pundits have described 2003 as the year of Linux because that OS is widely expected to continue eating into Microsoft's market share and technological lead. But as far as I'm concerned, 2003 is the year of spam. And I'm not going to take it anymore. I suggest you do likewise.

Spam is an escalating problem. According to a recent Harris Interactive poll, more than 40 percent of all email is spam, up from just 13 percent a year earlier. By the end of this year, spam will account for more than half of all email; some enterprises are reporting that spam is already more than 80 percent of their incoming email. In America, almost half of all spam received comes from overseas. Most alarmingly, spam is evolving from a nuisance to a business threat, with volume email creating a Denial of Service (DoS)-style attack, bringing email servers to their virtual knees.

Finally, however, various groups are looking to end this plague. After years of ineffectual responses to spam, Congress appears poised to pass a law that would require spammers to let consumers opt-out from email lists. And the Federal Trade Commission (FTC) recently shut down six online marketers for selling fake international drivers licenses. Interestingly, the FTC has asked US consumers to forward the agency spam for analysis and potential prosecution, and the agency collects roughly 75,000 spam messages a day. However, because spam isn't technically illegal in the United States, as it should be, few cases are brought against spammers.

Government action will likely be an important component in the fight against spam going forward, but we need more immediate action. Recently, two groups of geeks met—one in Hawaii and one in Cambridge, Massachusetts—to discuss ways to battle spam. Last week in Hawaii, the Global Internet Project (GIP) held a workshop to discuss how enterprises are trying to battle spam, noting that US companies lose millions of dollars a year as employees spend work time opening and deleting spam. Also last week, at the Massachusetts Institute of Technology (MIT) in Cambridge, more than 500 programmers, hackers, IT administrators, and researchers gathered for a technical look at the concerns facing spam fighters. The problem, presenters noted, is that spammers are constantly adapting their techniques to overcome established spam-fighting techniques.

MIT's spam conference was originally expected to draw less than 70 people, and its swollen ranks indicate that the techie crowd is finally waking to the challenge. Attendees concluded that to defeat spam, they must destroy the spam business model. And to do so, programmers are working on a spam filter so effective that spammers would receive no responses and thus give up their efforts. Currently, the best spam filters are based on a scheme called Bayesian filtering, which assigns statistical probabilities to words to determine whether an email message is spam. Spam filters based on Bayesian filtering are often more than 99 percent effective, and this technology is currently in use in alpha versions of Mozilla Mail, Apple's OS X Mail.app, and MSN 8 email. During a recent trip to Las Vegas, Nevada, for the Consumer Electronics Show (CES), I used Mail.app for email on an Apple iBook, and it eliminated almost 500 spam messages over 7 days, without any training. When I returned home and opened Microsoft Outlook, which I've outfitted with a commercial, non-Bayesian spam filter, the spam returned in droves; I received more than 50 spam messages the first morning alone.

However, Bayesian filtering might ultimately be replaced by more effective technologies that can adapt, much as spam adapts. "Current systems all have one hole that spammers love to sneak through: They can't adapt," says Jason Rennie, who represented the MIT AI Lab at the conference. "Hand-crafted rule-based classifiers have static rules. Bayesian approaches use static pre-processing that ignores \[characters such as\] '!!!!' and/or \[Far East text\]. We need a new approach—a way to dynamically learn patterns that can identify spam." Rennie presented a compression-based algorithm that might offer an intriguing solution to this problem.

As exciting as the legal and technical possibilities are, more can be done to fight spam. As the maker of the most widely used messaging server solution in the world, Exchange Server, Microsoft isn't doing enough to battle spam. This year, the company will release a new Exchange version, Exchange Server 2003 (formerly code-named Titanium) and an all-new Outlook 11 client. Given that Microsoft has rolled out Bayesian filtering in its consumer-oriented MSN service, you might assume that the company is meeting the spam challenge head-on with its new enterprise messaging products. But Microsoft is doing virtually nothing to battle spam in these products. On the Outlook side, Outlook 11 includes no new spam-fighting tools—only the manual, user-oriented filtering scheme from previous versions. (Outlook 11, however, won't load images in HTML email, by default, which could help prevent viruses; I don't consider this an effective spam tool in any way.) With Exchange 2003, Microsoft is adding a weak blacklist feature that you must constantly update for it to be even remotely effective. I find it insulting that Microsoft would even consider offering new messaging solutions that include such meager antispam technologies.

I'm in Redmond this week, and I'll be meeting with representatives of the Exchange team. I'll use this opportunity to let them know how I feel about spam. If you're tired of the spam deluge and think Microsoft needs to do more to fight it, let me know. I'll forward your feedback to Microsoft.

Resources
MIT Spam Conference
FTC: E-Commerce and the Internet
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish