- Home
- Linux Systems Design & Support
- Local documentation
- SPAM Assassin
SPAMASSASSIN
1. Introduction
SpamAssassin is a free tool that is used to detect email spam. A typical application for SpamAssassin is to have it "mark" spam as such by adding headers to the email. Other applications, including procmail and most email clients, can then detect these headers and file them into a Spam folder.
2.Configuration
For Computer Science users that use netra to receive mail, there are two scripts that will help you get started with SpamAssassin. They are located on cyndra in /usr/local/bin, and are called "spamassassin-install" and i "spamassassin-uninstall". "spamassassin-install" creates a .procmailrc file, while "spamassassin-uninstall" moves the .procmailrc file to .procmailrc-off. Once the install program has been run, you'll need to edit the .procmailrc file to decide what happens to mail marked as spam. A sample procmail rule is given, and can simply be uncommented. Alternative to using the scripts provided in /usr/local/bin, a template .procmailrc file is also available at /usr/local/spamassassin/.procmailrc.
3. Training the Bayesian Filter
The latest version of SpamAssassin has a Bayesian filter that can improve SpamAssassin's ability to distinguish the spam from the ham. SpamAssassin will sometimes mark spam as ham or vice versa. When this happens, you can tell SpamAssassin's Bayesian filter to "relearn" a mail as spam or ham.
On netra, use the sa-learn command to train the Bayesian filter. Here are some examples:
sa-learn --forget --ham --mbox /var/spool/mail/netid
This will tell SpamAssassin that all messages in your inbox are not spam. The --forget flag tells it to forget the message if it has learned it before. Without this flag it will ignore messages that it has already marked as spam or ham.
sa-learn --forget --spam --mbox /home/netid/mail/NewSpam
This tells SpamAssassin that all messages in the NewSpam folder should be relearned as Spam messages. If you moved messages from your inbox to this directory because SpamAssassin didn't catch the message as Spam, this command will train the Bayesian filter accordingly.
From the sa-learn manpage:
NAME
sa-learn - train SpamAssassin's Bayesian classifier
SYNOPSIS
sa-learn [options] --file message
sa-learn [options] --mbox mailbox
sa-learn [options] --dir directory
sa-learn [options] --single < message
Options:
--ham Learn messages as ham (non-spam)
--spam Learn messages as spam
--forget Forget a message
--rebuild Rebuild the database if needed
--force-expire Force an expiry run, rebuild every time
-f file, --folders=file Read list of files/directories from file
--dir Learn a directory of RFC 822 files
--file Learn a file in RFC 822 format
--mbox Learn a file in mbox format
--showdots Show progress using dots
--no-rebuild Skip building databases after scan
-L, --local Operate locally, no network accesses
-C file, --config-file=file Path to standard configuration dir
-p prefs, --prefs-file=file Set user preferences file
-D, --debug-level Print debugging messages
-V, --version Print version
-h, --help Print usage message
See the sa-learn manpage on netra for more information.
4. Further configuration
In your /.spamassassin directory, there is a file containing your preferences for SpamAssassin, called user_prefs. It is very well documented, and allows for setting your threshhold, deciding whether or not SpamAssassin modifies the suject field, and whether SpamAssassin puts a full report in the headers or not. Whitelists and blacklists can be stored here. Note that after a certain number of non-spam messages from a particular list, users are automatically whitelisted.
There is very complete documentation for SpamAssassin located at http://www.spamassassin.org.
5. Gotchas
SpamAssassin isn't perfect. There's a chance that it will mark legitimate mail as spam, or not mark spam as such. Therefore it is highly recommended that you do NOT use procmail or your mail reader to automatically delete everything that SpamAssassin marks as spam. Instead, have procmail or your mail reader move everything marked as spam to a "Spam" folder, which you can then run through, glancing at from subject fields, and, the vast majority of the time, select all messages and delete. SpamAssassin assigns a score to each message, and each individual user can decide how low to set the threshhold that defines a message as spam. Obviously, the lower the threshhold, the more likely you are to have legit messages marked as spam, and the higher the threshhold, the more likely you are to have spam sneak through the filter.