Yale University

ITS Linux Systems Design & Support

Yale ITS Home Accounts

Gateways for:

Help Desk
203.432.9000
203.785.3200

ITS Office
Yale University
175 Whitney Avenue
P.O. Box 208276
New Haven, CT
06520-8276
USA

Yale logo.

SPAMASSASSIN

1. Introduction

SpamAssassin is a free tool that is used to detect email spam. A typical application for SpamAssassin is to have it "mark" spam as such by adding headers to the email. Other applications, including procmail and most email clients, can then detect these headers and file them into a Spam folder.

2.Configuration

For Computer Science users that use netra to receive mail, there are two scripts that will help you get started with SpamAssassin. They are located on cyndra in /usr/local/bin, and are called "spamassassin-install" and i "spamassassin-uninstall". "spamassassin-install" creates a .procmailrc file, while "spamassassin-uninstall" moves the .procmailrc file to .procmailrc-off. Once the install program has been run, you'll need to edit the .procmailrc file to decide what happens to mail marked as spam. A sample procmail rule is given, and can simply be uncommented. Alternative to using the scripts provided in /usr/local/bin, a template .procmailrc file is also available at /usr/local/spamassassin/.procmailrc.

3. Training the Bayesian Filter

The latest version of SpamAssassin has a Bayesian filter that can improve SpamAssassin's ability to distinguish the spam from the ham. SpamAssassin will sometimes mark spam as ham or vice versa. When this happens, you can tell SpamAssassin's Bayesian filter to "relearn" a mail as spam or ham.

On netra, use the sa-learn command to train the Bayesian filter. Here are some examples:

sa-learn --forget --ham --mbox /var/spool/mail/netid
   

This will tell SpamAssassin that all messages in your inbox are not spam. The --forget flag tells it to forget the message if it has learned it before. Without this flag it will ignore messages that it has already marked as spam or ham.

sa-learn --forget --spam --mbox /home/netid/mail/NewSpam
   

This tells SpamAssassin that all messages in the NewSpam folder should be relearned as Spam messages. If you moved messages from your inbox to this directory because SpamAssassin didn't catch the message as Spam, this command will train the Bayesian filter accordingly.

From the sa-learn manpage:

NAME
       sa-learn - train SpamAssassin's Bayesian classifier

SYNOPSIS
       sa-learn [options] --file message

       sa-learn [options] --mbox mailbox

       sa-learn [options] --dir directory

       sa-learn [options] --single < message

       Options:

        --ham                             Learn messages as ham (non-spam)
        --spam                            Learn messages as spam
        --forget                          Forget a message
        --rebuild                         Rebuild the database if needed
        --force-expire                    Force an expiry run, rebuild every time
        -f file, --folders=file           Read list of files/directories from file
        --dir                             Learn a directory of RFC 822 files
        --file                            Learn a file in RFC 822 format
        --mbox                            Learn a file in mbox format
        --showdots                        Show progress using dots
        --no-rebuild                      Skip building databases after scan
        -L, --local                       Operate locally, no network accesses
        -C file, --config-file=file       Path to standard configuration dir
        -p prefs, --prefs-file=file       Set user preferences file
        -D, --debug-level                 Print debugging messages
        -V, --version                     Print version
        -h, --help                        Print usage message

See the sa-learn manpage on netra for more information.

4. Further configuration

In your /.spamassassin directory, there is a file containing your preferences for SpamAssassin, called user_prefs. It is very well documented, and allows for setting your threshhold, deciding whether or not SpamAssassin modifies the suject field, and whether SpamAssassin puts a full report in the headers or not. Whitelists and blacklists can be stored here. Note that after a certain number of non-spam messages from a particular list, users are automatically whitelisted.

There is very complete documentation for SpamAssassin located at http://www.spamassassin.org.

5. Gotchas

SpamAssassin isn't perfect. There's a chance that it will mark legitimate mail as spam, or not mark spam as such. Therefore it is highly recommended that you do NOT use procmail or your mail reader to automatically delete everything that SpamAssassin marks as spam. Instead, have procmail or your mail reader move everything marked as spam to a "Spam" folder, which you can then run through, glancing at from subject fields, and, the vast majority of the time, select all messages and delete. SpamAssassin assigns a score to each message, and each individual user can decide how low to set the threshhold that defines a message as spam. Obviously, the lower the threshhold, the more likely you are to have legit messages marked as spam, and the higher the threshhold, the more likely you are to have spam sneak through the filter.



Jump to top.

Last modified: Friday, 22-Feb-2008 10:15:20 EST. (jj)