How To Nip A Little More Spam In The Bud - 21 Feb 2006 | ITPro Today: IT News, How-Tos, Trends, Case Studies, Career Tips, More

Most spam filtering systems do a good job of tagging spam, but many can be tweaked for better detection rates and better performance.

I ran a test on over 254,000 email messages to see which filters work best, and then adjusted filters so that the most effective filters run first, thus reducing processor overhead. My tests were conducted against live in-coming email on a legitimate mail server.

What I found is quite revealing. The most effective filter on this particular mail system is a simple foreign language filter that detects over 48.02% of all spam. This filter works by eliminating all email written in any language other than the specific languages that are define as valid for the enterprise.

The second most effective filter is a DNS blacklist filter, which checks mail headers to see if a message came through a mail server that is known source of spam. This particular filter eliminates another 39.82% of all incoming email.

Together, the foreign language filter and DNS blacklist filter work to eliminate 87.84% of all spam coming into the mail server. The third most effective filter is a simple word filter that works by implementing Bayensian filter. The Bayes filter eliminates another 9.57% of spam, and with this filter in place the total spam elimination rate reaches a total of 97.41%. Not too shabby, but still not perfect.

Nine more filters run after the three previously mentioned filters to catch the remaining 2.59% of spam. These filters include tricks such as filtering character patterns, checking for RFC mail header compliance, filtering particular character sets, looking for odd forms of spelling, etc.

With these filters in place, this particular enterprise finds that only one or two messages per month are able to slip past the filters and many months not a single spam message makes it through. And even when one does slip past, such messages are easily learned by the spam filters so that similar messages do not make it through in the future.

Comments

Plain text