An Email Filtering Script

Downloads
42108.zip

Last year, the SoBig virus humbled me. My team uses Microsoft Exchange Server to manage our internal email system and host several closed external discussion lists. We use client-side spam software to filter out noise from our Inboxes, but we had nothing in place to filter email on the server.

After SoBig hit, each member of my team manually filtered roughly 1GB of email a day. Most of the volume was due to the virus and the inevitable virus-alert messages that found their way to our server. Our server not only had to process each offensive email message but also had to devote disk space to storing it and network bandwidth to transferring it to the user's Microsoft Outlook Inbox. Then, the user's Outlook filters took time and consumed resources to delete the message from the Inbox and notify the server to delete the message from the user's Exchange mailbox store. To mitigate the problems, we decided to write a Perl script to run on our Exchange server and filter out on arrival all incoming email that might be a virus.

Writing the script was remarkably easy, thanks to Microsoft Collaboration Data Objects (CDO). The code runs on any Windows machine that implements the Microsoft SMTP service, including Windows Server 2003 and Windows XP. Because Exchange Server 2003 and Exchange 2000 Server both rely on the SMTP service, the solution works just as well with them.

The Windows SMTP Service
Windows 2000 and later OSs include an SMTP (email) service, which Web pages and applications use to send email. When the SMTP service is running on your machine, Web pages and applications can simply submit outgoing email to the service, which then routes the messages to other email servers. CDO is a collection of COM objects that provide an easy way to access Windows messaging services, including SMTP mail. CDO abstracts many of the complexities of working with communication technologies, simplifying how programmers interact with these services. For details about CDO, see http://msdn.microsoft.com/library/en-us/cdo/html/_olemsg_overview_of_cdo.asp.

CDO Transport Event Sinks
The SMTP service is a fairly complex system of components that talk with one another. Upon receiving an email message, the SMTP service processes it by handing it sequentially to several components, each of which can examine and modify the message. One of those components is the CDO transport event sink.

Transport event sinks can perform an action based on a message's content. The SMTP transport event sink handles events that occur when the SMTP service receives an email message. When an incoming message arrives at the service— triggering the OnArrival() event— the transport event sink can call a Windows Script Host (WSH) script, load it, and call the script's ISMTPOnArrival_OnArrival() subroutine if one exists. Further details exceed the scope of this article, but you can learn more at http://msdn.microsoft.com/library/en-us/cdosys/html/_cdosys_smtp_nntp_transport_event_sinks_with_cdo.asp.

How Transport Event Sink Scripts Work
Listing 1 shows the MessageFilter.pl transport event sink script. You must put the script's code between XML tags that tell the transport event sink what scripting language engine to use. The code at callout A and callout J in Listing 1 show the tags that specify the PerlScript engine. These tags let your script use any file extension (or none at all)— a flexibility that many administrators find attractive. The Perl comment character (#) at the beginning of each tag line lets you run the script both from the event sink and from a command line (e.g., to test the script) without triggering compile errors.

When the transport event sink executes the script, it first executes all the code in the default or main:: namespace. The event sink then searches the script for a subroutine called ISMTPOnArrival_OnArrival() and calls the subroutine, if it exists. The subroutine receives two parameters, as the code at callout E shows: $Message (a CDO message object) and $EventStatus (an event status value). The script can query the message object's properties to obtain information such as the message's sender, recipient list, and subject. For a list of all message object properties, go to http://msdn.microsoft.com/library/en-us/cdosys/html/_cdosys_imessage_interface.asp.

The script is supposed to set the CdoEventStatus value to a value that indicates the status of the script's message processing. For example, if the script determines that the message contains a virus and should be discarded, the script should be able to set the CdoEventStatus value to cdoSkipRemainingSinks (value of 1) to indicate that subsequent sinks don't need to process the message. By default, the event status value is cdoRunNextSink (value of 0), which indicates that the message is fine and that subsequent sinks should process it.

I say that the script is supposed to set the event status because that isn't what happens with Perl. In Perl, the value is passed in only as a value, not as a reference or an object. If you're running only one event sink and using a script as your CDO transport event sink event handler, not setting the CdoEventStatus value won't impact overall performance because you don't have any subsequent sinks. However, if you need to run multiple event sinks, not setting the event status causes subsequent sinks to process all messages, even those scheduled for removal, thereby creating unnecessary overhead. Thus, if you have multiple sinks and you're using Perl, you should consolidate all the sinks' logic into one monolithic event sink handler to reduce the overhead of running sinks unnecessarily.

After loading a script, the CDO transport event sink caches the resulting WSH object in memory. To optimize performance, the transport event sink uses the cached object whenever it needs to call the script. In other words, the Perl script is compiled only once but executed multiple times. If you modify the script, the CDO transport event sink detects the file's modified date and reloads the script. As a result, you get the performance of script caching with the benefits of touching the disk for each use. For information about using scripting languages to implement event sinks and how an event sink caches scripts, see http://msdn.microsoft.com/library/en-us/cdosys/html/_cdosys_implementing_sinks_with_scripting_languages.asp.

MessageFilter.pl
MessageFilter.pl weeds out email that looks like well-known viruses, Trojan horses, and other malicious code. The script looks for specific phrases in a message's Subject line and for message attachments that have specific file extensions.

At the code at callout B, MessageFilter.pl sets the $LOG_PATH variable, which supplies the path to the log files that store the date, time, subject, and sender for each discarded email message. The code at callout C supplies a list of undesirable message subjects and file extensions. When a message has a subject that's in the list or includes an attachment that has a listed extension, the script rejects the message. You can add your own subjects and extensions to the appropriate array.

To create the @UNWANTED_SUBJECTS_REGEX array, the script uses Perl's map command to process each string before adding it to the array. The processing uses the qr// regular expression quote operator, which produces a compiled regular expression. Using a precompiled regular expression not only simplifies the script but also boosts performance slightly. Because the subjects listed in the @UNWANTED_SUBJECTS_REGEX array are substrings, MessageFilter.pl discards messages that contain a listed string anywhere in the subject.

The @UNWANTED_EXTENSIONS array contains a list of file extensions that the script compares with every file that's attached to an incoming email message. If an attached file's extension matches a listed extension, the script discards the message. Unlike the unwanted-subjects array, @UNWANTED_EXTENSIONS is a typical array of strings. The script uses the pipe character (|), which represents the OR operator in a regular expression, to join all the array elements. Then, the script uses the qr// quote operator to create an $UNWANTED_EXTENSIONS_REGEX variable that contains a compiled regular expression consisting of the joined file extensions.

For every email message that Exchange receives, the transport event sink first executes the code from callout A through callout D. (The code at callout D sets some CDO constants that the script uses later.) Then, the transport event sink calls the ISMTPOnArrival_OnArrival() subroutine. All global variables created in the default namespace remain available.

The Subroutines
The ISMTPOnArrival_OnArrival() subroutine first sets the $Message and $EventStatus variables to the values that the transport event sink passed to the script. The $Message variable is a COM object that provides access to the IMessage interface. This COM object contains all available information about the email message (e.g., sender, time of receipt). The subroutine always sets the $EventStatus variable to 0 (the equivalent of the $cdoRunNextSink variable). However, for the reasons I described earlier, you can ignore this variable.

The code at callout F is the real engine of this script. This code first calls the IsSubjectExcluded() and IsExtensionExcluded() subroutines, which together determine whether the message should be rejected. If the message should be discarded, the script sets the messagestatus field of the EnvelopeFields collection object to tell the SMTP service to discard the message.

The IMessage interface has a property called EnvelopeFields, which returns the EnvelopeFields collection object. The EnvelopeFields collection defines a set of fields that describe the message while it's traveling through the SMTP service. You can use the messagestatus field to indicate whether the message should go to the user's mailbox, be discarded outright, or be put into a "bad mail" mailbox.

MessageFilter.pl sets the messagestatus field to discard the message. To do so, the script refers to the http://schemas.microsoft.com/cdo/smtpenvelope/messagestatus Uniform Resource Identifier (URI). As callout F shows, the script creates a $Fields variable, assigns the EnvelopeFields collection's messagestatus field to that variable, and sets the field's value to $cdoStatAbortDelivery (a value of 2). This is an awkward way to set this value, but the script has to do it only once.

Next, the code at callout F calls the EnvelopeFields collection's Update() method to ensure that the message status has been updated. CDO caches the EnvelopeFields collection to increase performance, and calling Update() commits changes from the cached object to the actual message. The script then sets the result code ($Result) to $cdoSkipRemainingSinks (value of 1). Finally, the script logs a note that the message was discarded.

The IsSubjectExcluded() subroutine does just one task: It checks to determine whether the email message's subject contains an unwanted subject. The code at callout G queries the CDO message object's subject and compares it with each substring in the unwanted-subjects array. If the message contains an unwanted subject, the routine returns the value 1 to reject the message.

The IsExtensionExcluded() subroutine returns the value 1 if the message should be rejected based on an attached file's extension. However, as the code at callout H shows, this subroutine uses different logic. The subroutine checks the CDO message object's Attachments collection property. When the message has attachments, the subroutine scrutinizes each attachment and compares the attached file's name with the precompiled regular expression $UNWANTED_EXTENSIONS_REGEX. When the subroutine finds an attached file that has one of the extensions listed in the unwanted-extensions array, it returns a value of 1, indicating that the message should be discarded.

Finally, when the script sends data to the log file, it executes the code at callout I. This code first checks whether the LOG filehandle is valid to determine whether the log file is open. If the filehandle isn't valid, the subroutine tries to open the log file and set the autoflush flag to 1 (i.e., enabled). The log is opened here instead of earlier in the script to increase performance; if nothing is logged, there's no point in opening the log file. I enabled the log file's autoflush flag to make managing the logs easier during testing. When you move the script to production, you can set the autoflush flag to 0 (i.e., disabled). If you write to the log only once, there's no substantial performance difference between setting the flag to 0 and setting it to 1.

Installing and Configuring the SMTP Service
To install the SMTP service, open the Control Panel Add/Remove Programs applet. Select Add/Remove Windows Components in the left pane, select the Internet Information Services (IIS) component, and click Details. Select SMTP Service, click OK, then click Next, and Windows will install the service.

You can use the Microsoft Management Console (MMC) IIS snap-in to configure the SMTP service and monitor connected users. By default, the SMTP service routes email to other SMTP servers and accepts email only for the local machine's full DNS name. Thus, the service accepts all incoming email whose address references the computer's full domain name. For example, if my computer name is MyMachine and my domain name is roth.net, the SMTP service accepts messages to any address that ends with @MyMachine.roth.net. Your DNS server must contain an entry that maps your computer's full DNS name to its IP address, or the DNS server must at least have an MX record.

Installing PerlScript
For MessageFilter.pl to execute properly, you must register the PerlScript WSH engine. When you properly install ActiveState's ActivePerl, ActivePerl registers the WSH engine. To check to determine whether PerlScript is registered, type the following at a command line:

cscript /e:PerlScript C:\AnyFilename

where C:\AnyFilename is literally any pathname— the file doesn't even have to exist. (Although this command appears on several lines here, you enter it on one line in the command-shell window. The same holds true for the other multiline commands in this article.) If PerlScript isn't registered, the command will return an error saying that it can't find the script engine 'PerlScript'. When PerlScript is registered, the command tries to load the specified file. If the pathname points to a valid Perl file, the file runs; otherwise, you'll see an error message. If PerlScript isn't registered, register it by locating Perl's bin directory (where the perl.exe file is located) and entering the following at a command line:

regsvr32 PerlSE.dll

Installing the Script
Installing MessageFilter.pl is probably the most difficult part of using it. The details about how to install a transport event sink script are lengthy and exceed the scope of this article; however, a simple Microsoft tool called SMTPReg.vbs makes the job much easier. You can download SMTPReg.vbs at http://msdn.microsoft.com/library/en-us/smtpevt/html/_smtpevt_smtpreg_vbs_event_management_script.asp. For information about the tool, see http://msdn.microsoft.com/library/en-us/cdosys/html/_cdosys_enabling_or_disabling_a_binding.asp.

To install MessageFilter.pl, you must first bind to the CDO transport event sink to create a mapping between a particular event— in this case, the OnArrival() event, which is triggered when a message arrives at the SMTP service— and a particular name. The name is an arbitrary name you provide to identify the mapping. I suggest using a name that has some meaning to the script. You'll use this name if you ever disable or remove the script. You specify several parameters when you create a binding, but typically the name parameter is the only one you'll need to change; you can use defaults for the others. For every script you add (i.e., every binding you create), you need to specify a different name. To create a binding called MessageFilter, you'd type the following at a command line:

cscript.exe smtpreg.vbs
  /add 1 onarrival
  MessageFilter
  CDO.SS_SMTPOnArrivalSink
  "MAIL FROM=*"

Next, you configure this binding to point to the script:

cscript.exe smtpreg.vbs
  /setprop 1 onarrival
  MessageFilter Sink
  ScriptName
  "C:\MessageFilter.pl"

Now, any new message that the SMTP service receives will trigger MessageFilter.pl. All you need to do is ensure that the SMTP service is running.

To remove the script, type

cscript.exe smtpreg.vbs /remove
  1 onarrival MessageFilter

You can add more scripts by changing the binding's name. You can add and remove CDO transport event sink scripts without having to restart the SMTP service— the bindings are created and removed dynamically. You can also use the command

cscript.exe smtpreg.vbs /enum

to display the CDO transport event sinks that are registered with your system.

Testing and Debugging
To test the CDO transport event sink script, register it and send an email message to the computer that runs the SMTP service. For example, you could run Outlook Express and send a message to any user at the LocalHost domain (e.g., test@localhost). To configure the service to have a LocalHost domain, open the IIS snap-in. Expand the Default SMTP Server tree and select the Domains node. Right-click the node and select New, Domain. The New SMTP Domain Wizard will ask whether you want to create a Remote domain or an Alias domain. Create an Alias domain named localhost. Then, you can use that domain to send messages from the computer to itself.

Because the SMTP service has no UI that lets you watch the Perl script execute, debugging the script can be difficult. The easiest way to debug the script is to write log data to a log file, which lets you print the values of variables at different points in the script.

However, managing an ever-growing log file can be burdensome. A syslog-like script, such as the one in "Converting Perl Scripts to Win32 Perl Services," May 2003, InstantDoc ID 38404, can be quite useful. You can download syslogd.pl from the Windows Scripting Solutions Web site and use it as your log file. Run syslogd.pl in a command-line window and modify MessageFilter.pl to open the named pipe that syslogd.pl creates. To do so, modify the $LOG_PATH variable at callout B in Listing 1 to

$LOG_PATH = "\\\\.\\pipe\\syslog";

MessageFilter.pl will then open the syslog script's named pipe and print all logging information there. The information will appear on screen as it's logged.

A Winning Solution
In my experience, Perl 5.8 and Perl 5.6 both work well. However, Perl 5.5 seems to cause IIS to crash repeatedly. I suggest that you use the latest version of ActivePerl for MessageFilter.pl.

Using scripts for CDO transport event sinks can be a winning solution for your Exchange server. Because sinks are easy to code and quick to modify and prototype, you can have a message filtering solution running quickly.

MessageFilter.pl is in production on my network and has exhibited quite palatable performance while filtering out tens of thousands of messages sent by the MyDoom virus. You can easily modify MessageFilter.pl to better meet your needs. For example, instead of discarding email messages that have an offending attachment, you could remove the attachment, add an explanatory note to the message body, then send the message to its intended destination.

You could also write some interesting variations of MessageFilter.pl. For example, you could add scripts to do spam filtering (freeware— and even Perl libraries— are available for this work), intelligent message routing, virus checking, removing HTML code from email (or stripping out references to graphics and links), and censoring sensitive data. And when the next big virus du jour appears, you can quickly mitigate it on your Exchange server simply by adding a couple of lines to MessageFilter.pl.

Comments

Plain text