Not Ready for an Antispam Solution?

Slowly the flood begins—one email message, then another, and another. The Help desk is receiving escalating incidents of people receiving email advertisements that aren't addressed to them, and users are complaining about receiving pornography and other offensive ads. You spend hours sifting through log files and tracking messages and finally go to the powers that be to propose that the company implement an antispam solution. You have a handful of good reasons why an antispam solution would benefit the company, but ultimately, the decision makers say no.

5 Spam Assumptions
In my experience, five assumptions are behind most decision makers' denial of the need for an antispam solution. After you understand what these assumptions are and what factors cause the assumptions, you can present the decision makers with the facts, allay their fears, and mitigate potential risks.

1. We don't get that much spam.
If you walk down a row of cubicles, you'll get varying responses when you ask how much spam each person receives each day. Some say none, some say just two or three messages, and others reply with an emphatic "Too much!" Executives and managers who don't receive much junk email often believe that spam isn't a problem because they don't hear many complaints from users. (One reason that executives don't receive spam is that their email addresses typically aren't exposed on the Internet. Spammers harvest addresses from Web postings, newsgroups, and other places that executives often don't visit.)

Spam might not seem to be a serious problem for your organization now, but spam is on the rise. Gartner estimates that by 2004, 50 percent of email traffic will be spam. At the beginning of 2003, Ferris Research estimated that at least 30 percent of the email that an organization receives is spam. In June 2003, the Federal Trade Commission (FTC) estimated the current volume of spam at 40 percent of an organization's email. HP estimates that at times, as much as 70 percent of all its incoming mail is spam.

If your company's management believes that users don't receive much spam, you need to provide up-to-date, credible statistics from organizations such as those I just mentioned. If management believes that spam is a problem that only other companies face, you need to gather evidence directly related to your organization. One way you can do this is to establish a mailbox or Exchange Public Folder into which people can forward or copy spam. You can use this approach to track the amount of spam users receive and to see what types of spam are flowing in. Keep in mind that if you choose this route, you need a great deal of user participation, which might be difficult to achieve.

First, you must rely on users to sort through their email and forward spam, which can be a time-consuming process. Second, people might be too embarrassed or fearful to forward messages that contain pornographic or inappropriate content. They don't want you to assume that they've somehow done something to attract this type of spam. Others might have the perception that they'll get in trouble because a prohibited type of message is in their mailbox and forwarding it will create a direct link back to them. If you don't get enough user participation or users forward only some spam, you won't get accurate statistics. You can mitigate these problems by providing assurances that people won't be judged or penalized. You can also use a smaller group that represents your total user population, then develop projections according to the group's feedback, the total amount of email it receives, and the amount of email that your organization as a whole receives.

Another factor you need to consider if you rely on users to detect spam is that simply opening a piece of spam can have negative consequences. Most spam contains spam beacons or Web bugs, which are bits of HTML code that try to retrieve image files from an Internet-based Web server. The HTML code links to the message addressee (usually through a database). The spammer uses Web server logs to track who's reading the spam messages, when they're reading the messages, and what types of spam they read. (For more information about spam beacons, see "Spam Beacons," September 2003, http://www.winnetmag.com/microsoftexchangeoutlook, InstantDoc ID 39501.) Users might need to open messages to determine whether they're spam, but doing so might trigger beacons that will likely result in more spam, so ultimately the data-gathering effort could end up compounding the spam problem.

Also remember that the people who review the messages that users forward must be cautious. If they open a message, they can trigger beacons within the message. Therefore, they should open the messages only when disconnected from Internet access, or they should take steps to ensure that they don't trigger any spam beacons. One way that you can avoid this problem is to put bogus proxy server information into Microsoft Internet Explorer's (IE's) connection settings. Outlook uses IE's proxy settings when it opens connections to process HTML- encoded messages. If an incorrect proxy is specified, when Outlook tries to process the spam beacons, the program won't be able to make the connection to the Internet server. If you're using Microsoft Office Outlook 2003, you won't need to use proxy-setting trickery because this version has new antispam features that can help end users fight spam, as well as safety mechanisms that prevent users from triggering the beacons. For more information about this topic, see "Suppressing Spam," October 2003, http://www.winnetmag.com, InstantDoc ID 40469.

A better way to determine how much spam is flowing into your organization is to install spam-filtering software but run it only in detection mode. Doing so engages the software's spam-identification features but prevents the messages from being modified, quarantined, or deleted. All the antispam products that I've used or reviewed offer this capability. Look for packages that have reporting features that are granular enough for your needs. For example, if you're concerned about spam that contains attachments or spam that's offensive or pornographic, you'll need reporting tools that distinguish these characteristics.

2. We don't need a spam filter; we already use a Realtime Blackhole List.
A Realtime Blackhole List (RBL), also known as a blacklist, is a mechanism that lets a mail server reject email from a system that has been used to send spam. The blacklist stores the IP addresses of systems that are known to send (or are suspected of sending) spam. When a mail server receives an SMTP connection, it can look up the connecting server's IP address in the blacklist. If the IP address is on the list, the receiving server can either refuse the connection or accept it and silently delete the message. If the IP address doesn't appear on the blacklist, the receiving server forwards the message to the intended recipient.

One problem with a blacklist is that it's primarily a reactive mechanism. A system usually must send spam before ending up on a blacklist. More conservative blacklists try to confirm that a system is being used repeatedly to send spam before including it on the blacklist; more aggressive blacklists block messages based on what might seem like circumstantial evidence. Sometimes blacklists block entire subnets of IP addresses (e.g., whole subnets used to connect cable and DSL subscribers). In these cases, the mail server will reject email originating from such an IP address even if the message isn't spam.

With the high-speed connection potential of broadband Internet and the huge number of ISP options, a spammer can set up an operation from home and theoretically send a flood of spam by using different IP addresses each time, sometimes changing IP addresses or domain names on a daily basis. In such cases, a blacklist is ineffective, and spam originating from these sources will likely penetrate your defenses.

A blacklist can help reduce the amount of spam users receive, but as your only spam defense, it can fall short of expectations. However, when you use a blacklist in conjunction with a spam-filtering solution that detects spam characteristics in email messages, you can significantly reduce the amount of junk email that ends up in your users' Inboxes.

3. A spam filter will block too much legitimate email.
The fear of false positives (i.e., blocking legitimate email) is probably the biggest reason companies hesitate to implement spam-filtering technology. Spam filters use heuristic tests or statistical systems such as Bayesian classification to assess and assign a score to an email message. If the score is high enough, the filter labels the message as spam.

Simple spam filters work by looking for specific keywords or flag words that occur with some frequency or in some combination. Users worry that legitimate messages will contain keywords or attributes that make them appear to be spam and that these messages won't reach their Inboxes. Take, for example, a clothing retailer who gets a message with order-tracking information from an undergarment supplier. A simple spam filter might detect the high occurrence and frequency of words such as sexy, thong, bikini, and red and flag the message as spam.

In reality, today's spam filters do much more than simply search for keywords. Filters also look for characteristics such as sender domains that don't match the originating system, URL links (especially ones that repeat), and unsubscribe references and links. Some filters check messages against databases that contain known spam or the URLs that spam messages often contain. These mechanisms work much the same as an RBL. The filter checks a database for a copy of the message (actually a hashed representation of the message) or a URL link from positively identified spam. If the filter finds a match, it increases the message's spam score. Some implementations perform filtering after the RBL check has passed the message; others use the RBL lookup results as an input to the overall spam score.

Depending on how a filter tests and scores a message, the filter might be able to determine the type of spam. By ordering and weighting the tests, you can place more or less importance on certain message characteristics. Early in the evaluation, tests might check for HTML formatting, whether the sender is on a blacklist, and whether any URLs match those in a known spam database. The cumulative score of these tests might be enough to classify the message as spam. Further tests (e.g., looking for vulgar words) could then classify the message as offensive spam.

No filter can completely eliminate false positives because some legitimate messages will have enough spamlike attributes to earn a spam classification. You can mitigate the risk of false positives by tuning the filter rules to account for your organization's message profiles. For example, a pharmaceutical company might need to configure tests so that the filter doesn't look at drug names or to ensure that drug names don't contribute significantly to the overall spam score.

Another false-positive concern is the desire to receive messages from specific groups. Some organizations might be partnered with companies that use direct-email marketing, and these organizations might want or even need to receive messages that the rest of the world might consider spam. In these cases, you can create a whitelist of approved sender and system addresses. A whitelist tells the spam filter to pass the message unchecked because you don't consider the sender a spammer.

Organizations that implement antispam mechanisms need to use a combination of whitelists and filter tuning to reduce the number of false positives that they see. In my experience, most of the messages that are incorrectly classified as spam are newsletters, bulletins, or newsgroup posts. These messages often end up incorrectly classified because they have attributes such as HTML formatting or advertisements, and in some cases, are sent by the same software that spammers use. However, you can easily identify the sources of these messages and add them to a whitelist.

Another misconception about false positives is that the spam filter deletes these messages. All spam-filtering software that merits use provides you with at least three disposition options, as Table 1 shows. Except for messages that rate a high spam score or those that are rejected because of a blacklist, most organizations don't delete email (at least not initially).

As with any new system, you should conduct a pilot implementation before you make a production deployment. During the pilot, perform tuning and build most of your whitelist entries. After the pilot, use a tag-and-deliver option for your production deployment. This method doesn't eliminate the spam from user's mailboxes, but depending on your implementation, it can make identifying spam easier.

Tagging typically adds a text prefix, such as SPAM: to a message's subject line (e.g., a subject of Very good news becomes SPAM: Very good news). Then, users can use Outlook to configure rules to perform some action when a tagged message is found in the Inbox. As I explain in the sidebar "Using Rules to Handle Spam," the most common action is to move the message to a separate folder. Tagging the subject line also lets you easily see which messages are spam even if you're not using a rule to move them into another folder, which is a significant plus if you're using a client or system that doesn't have rules capability.

Although the tag-and-deliver option essentially defeats what you expect a spam filter to do—get the spam out of your users' mailboxes—at least users can easily move the suspect messages out of their Inbox to a junk mail folder. When moved to a junk mail folder, the messages are no longer intermingled with other email, which helps reduce the risk that users will miss an important message. Don't forget, though, that everyone needs to review (and empty) their junk mail folders periodically to make sure the folders contain no false positives. Later, when people become comfortable that the filters aren't flagging vital messages as false positives, they can configure rules to delete messages instead of moving them into a junk mail folder. When the organization as a whole becomes comfortable with the effects and benefits of spam filtering, IT administrators usually receive permission to start deleting email with a high spam score instead of delivering it.

4. Putting SPAM on our subject line causes too many problems.
Some people are concerned about using filters to prefix certain messages' subject lines with the word SPAM. The primary concern is that the filter will tag a legitimate message, and someone will reply or forward the message with the prefix intact. Leaving the word SPAM in the subject can have two negative consequences. First, if you add SPAM to the subject and reply to the message sender, that person might be offended that you flagged the message as spam. Second, in a reply or forwarded message, the recipient might see the prefix and treat the message as spam, possibly deleting it.

Regarding the first concern, my opinion is that senders probably want to know that their messages are getting flagged as spam so that they can take steps to correct whatever is triggering the filter. If this is a concern in your situation, you have other options. First, you can use a different prefix, such as _SUSPECT, _ADVERT, or _ADVERTISEMENT. These words don't invoke the same feeling as SPAM and might be easier on the egos of the senders while still conveying that the message needs some extra care. The underscores prefixing the words help to ensure that the subject lines are distinguished from legitimate message subject lines that might contain these words. For example, a spam-handling rule would ignore the subject line Suspect seen at 10:30, but would process a message with the subject line _SUSPECT: Guaranteed Millions.

As for the second concern, yes, you might lose legitimate messages because of the word SPAM in the subject line. One way around this dilemma is to tag spam by inserting a reference into the message's SMTP header instead of prefixing the message subject. Some spam-filtering packages can insert a header extension field or X-header tag into a message as the message moves through an SMTP transport.

Figure 1 shows an example of some X-header tags that you might find in your messages even if you aren't running an antispam package. This example shows two X-header tags that Exchange 2000 Server uses to request a return receipt or to flag a message as important. Some antispam packages use X-headers in a similar way. When such a program scores a message, the program can write a notation in the header. For example, it might insert an X-Spam-79% tag or an X-Spam-Offensive tag to specify the message's spam score or classification. Rules can evaluate these header tags and act on the message. With this method of tagging, rules can handle spam, but the risk of a spam tag propagating through forwarded messages is eliminated.

Header tagging can also be useful when you're piloting antispam packages. Because message headers are usually hidden from view, only pilot participants would know about the tags and be able to use them to process spam. After the pilot project is over, you can let everyone know that header tags exist so that they too can start using the tags in their rules.

5. Spam isn't a threat.
Many people don't view spam in the same way they do virus-infected email; they consider spam simply a nuisance and don't realize its harmful side effects. Spammers are constantly changing tactics to deliver their junk mail. One such tactic is called a dictionary attack. The idea is that a spammer picks a domain for which he or she has an idea of the naming convention (e.g., firstname.lastname). The spammer uses a dictionary of first and last names to build email addresses using every combination of names and then sends messages to those addresses. The mail server will reject most of the combinations, but will accept some. This type of spamming places a high burden on Message Transfer Agents (MTAs) and directories. The receiving server must accept the mail, check the directory, and generate the nondelivery reports (NDRs) for the nonexisting accounts. Depending on how many thousands of messages the server receives, this processing can cause a significant delay in sending or receiving legitimate email. In addition, spammers use these dictionary attacks to build a directory of the names in your organization (aka directory harvesting). The messages arrive as spam, but the sender keeps track of which messages are returned as undeliverable. They can use this information later for social engineering (i.e., using a piece of information to convince someone that you're connected in some way to an organization) or other attacks against your systems.

Another threat from spam is virus delivery. The recent SoBig worm deployed a mechanism whereby one of the intended results was a platform that people could use to send spam. Many spam messages contain links to Web sites for products or links to unsubscribe. These links can provide an entry point for a virus.

And if the technology reasons aren't enough to convince organizations to implement an antispam solution, they should consider the lawsuits and hostile-workplace complaints users are filing because they're receiving offensive spam. Some people feel that companies aren't doing enough to stop these messages from arriving and feel compelled to file a complaint. But it isn't just the individual complaints that you need to think about. You need to consider how these suits might affect your organization's reputation.

The Days of Simple Email Are Gone
Today, email isn't just for casual communication—it's a vital part of most organizations' day-to-day operations and business processes. The people who make decisions for an organization know this and need to take it into consideration when addressing the spam problem. But these people also need real facts about spam and antispam solutions. Antispam technology has its risks, but you can mitigate those risks by providing information and best practices. The payoff will be worth the effort because spam is definitely more than just a nuisance.

Comments

Plain text