DIY Spam Filter For Your Office or ISP Email

You arrive at work with dread, knowing when you open your email client you’ll have to begin the daily ritual of sorting through all the spam.  You considering the collective time wasted by every employee deleting the same email and wonder why there isn’t more investment in spam filters.  Or why the spam filters in place aren’t better (I’m looking at you, Postini).

You don’t have control over the mail servers or spam filters at your company or ISP.  Client-side spam filters such as Outlook plugins are a possibility.  But what if you’re using a Mac or a Linux workstation that does’t support the plugin?  What if you check your mail on your phone or tablet?  The spam will still be there until your client-side filter handles it.

Better solution:  isbg  (IMAP Spam-Be-Gone)

isbg logs into your mail server with IMAP and searches your inbox for spam using SpamAssassin.  When a spam message is found, it can move the message to a spam folder, delete it, or flag it.  It can even learn what your spam looks like to tune itself.

The one major requirement of this solution is a computer that is always on.  For some this isn’t possible, but for the engineers who would likely implement this anyway, you probably have a few computers under your desk you’ve wondered what to do with next.  So pull one out, dust it off, and let’s get started.

Disclaimer:  This will produce a lot of extra IMAP traffic to your company or ISP mail server from a single host.  I take no responsibility if you get red flagged by your corporate InfoSec team.

Step 0.  Check your IMAP server

Before you waste time installing all this, you need to know your mail server support IMAP and what your folders are called.  If your company uses Microsoft Exchange, then chances are you can login to it with IMAP.  Most other mail servers support IMAP as well.  You can verify by using telnet to access port 143 on the mail server.

$ telnet mail.example.com 143
Trying 10.0.0.50...
Connected to mail.example.com.
Escape character is '^]'.
* OK The Microsoft Exchange IMAP4 service is ready.

Congratulations, you’ve got IMAP!

If you don’t mind your password going cleartext over the network you can list the directories as well, though I don’t recommend this for security reasons.  You can usually discern the directory structure from your mail client.  Here is an example from a Microsoft Exchange server.

A1 LOGIN <username> <password>
A1 OK LOGIN completed.
A1 LIST "" *
* LIST (\HasNoChildren) "/" Calendar
* LIST (\HasNoChildren) "/" Contacts
* LIST (\HasChildren) "/" "Deleted Items"
* LIST (\HasNoChildren) "/" "Deleted Items/Untitled Folder"
* LIST (\HasNoChildren) "/" Drafts
* LIST (\Marked \HasChildren) "/" INBOX
* LIST (\HasNoChildren) "/" "INBOX/Very Important"
* LIST (\HasNoChildren) "/" Journal
* LIST (\HasNoChildren) "/" "Junk E-Mail"
* LIST (\HasNoChildren) "/" Notes
* LIST (\HasNoChildren) "/" Outbox
* LIST (\HasNoChildren) "/" "Sent Items"
* LIST (\HasNoChildren) "/" Tasks
A1 OK LIST completed.

The two directories we work with later are INBOX and “Junk E-Mail”.  Notice that INBOX is always in all-caps, while other directories match the case used in the mail client.  Also notice that “Junk E-Mail” requires quotes around it because there is a space in the directory name.

Step 1. Find a Computer

Any computer will do.  I did this with a single core Athlon and 512 MB of memory.  Nothing is stored locally so minimal disk space is needed.   You can also do this on your regular PC or Mac, but the point is to leave this system all the time.  So if you do this on your laptop, the spam will collect until you turn the laptop back on, then isbg will have to catch up again to get ahead of the spam.  It’s better to do this on an always-on secondary computer.

If you don’t have a computer, you could use a shared Linux server where you don’t have root, as long as it has SpamAssassin installed or you can find a way to install it in your user directory.  However, this tutorial assumes you have root access.

Step 2. Choose and install an OS

I chose Ubuntu Linux 12.04 LTS (Precise Pangolin) because it’s supported, it’s easy, and it has all the right versions of the software we need.  You can also do this with 10.04 LTS (Lucid Lynx) but I recommend using backports to get the best versions of the software, specifically pdns-recursor.

You can also use a RHEL or CentOS distro, but I don’t have instructions for installing SpamAssassin on these.  If you choose one of these distros, make sure you are using at least SpamAssassin 3.3.x and pdns-recursor 3.3.x or you may run into some serious bugs.

Step 3.  Install SpamAssassin

You’ll want several software packages.  SpamAssassin, is the only software you need, but the other software improves SpamAssassin’s efficiency and accuracy:

pdns-recursor – SpamAssassin will generate a LOT of dns requests to verify hostnames during email analysis.  This may be slow, and taxing on company or ISP DNS servers.  pdns-recursor acts as a local dns cache so repeated requests for the same addresses can be avoided.

razor and pyzor – These packages increase fidelity of spam detection using a shared hash library of known spam.

$ sudo apt-get update
$ sudo apt-get install spamassassin
$ sudo apt-get install pdns-recursor razor pyzor

The software in the last line is optional, but helpful.

Step 4. Install isbg

I use git to install isbg to the /opt directory, so any user can use it, and I always have the latest version (0.99 as of this writing).  You can also install it via pip or easy_install, or by downloading it manually from github.  It’s just a single file (isbg.py) so just put it anywhere.

$ sudo apt-get install git
$ cd /opt
$ sudo git clone https://github.com/ook/isbg.git

Step 5.  Configure SpamAssassin

SpamAssassin works pretty well out of the box.  But there are a few things you may want to tune about it.  SpamAssassin configuration can be placed in /etc/spamassassin/local.cf or ~/.spamassassin/user_prefs.  I prefer user_prefs so that each user can customize the settings for themselves.

$ cd ~
$ mkdir .spamassassin && cd .spamassassin/
$ vi user_prefs   (or nano/emacs if you prefer)

Add the following to the configuration file at ~/.spamassassin/user_prefs

required_score 5.0
use_razor2 1
use_pyzor 1
score DNS_FROM_AHBL_RHSBL 0
score __RFC_IGNORANT_ENVFROM 0
score DNS_FROM_RFC_DSN 0
score DNS_FROM_RFC_BOGUSMX 0
score __DNS_FROM_RFC_POST 0
score __DNS_FROM_RFC_ABUSE 0
score __DNS_FROM_RFC_WHOIS 0

required_score – The required score for a message to be marked as spam (default 5.0 or higher).  It’s recommended that you do not change this from the default.

use_razor2 and use_pyzor – Use shared spam hash services (if installed in Step 3).

score – These lines disabled certain checks that are known to catch a very small percentage of spam, but cost a lot of time though DNS checks.  These lines significantly speed up your spam analysis.  More information at SpamTips.

Step 6.  Run isbg the first time

/opt/isbg/isbg.py --imaphost mail.example.com --ssl --imapuser username --imapinbox INBOX --spaminbox "Junk E-Mail" --delete --expunge --savepw --verbose

First run of isbg

We’re providing several arguments to isbg:

  • imaphost – the hostname of the mailserver
  • ssl – use secure communication (recommended!)
  • imapuser – your username on the mailserver, defaults to your Linux username
  • imapinbox – the folder to scan for spam, defaults to INBOX
  • spaminbox – the folder to copy spam to, defaults to INBOX.spam (notice I use quotation marks here because there is a space in the folder name)
  • delete – delete the messages from the imapinbox after all the copies are complete (recommended unless you want spam in both imapinbox and spaminbox)
  • expunge – delete only marked the messages for deletion. Expunge actually deletes them.
  • savepw – save your password encrypted to your home directory, prevents prompting for the password on subsequent invocations.  Only used the first time isbg is run.
  • verbose – produce tons of output to let us know things are working, or not working

Note: INBOX is capitalized, but other IMAP folders are not necessarily all caps.

This command will go through the entire inbox looking for spam.  It will copy any spam to the spaminbox, then delete it.  After it completes, future invocations will only look at new emails.  This command will run for a long time and produce copious amounts of output for each email scanned.  But it shows you the progress as you can see each email UID and compare it to the total number of emails in your INBOX.  If you interrupt the process, you will have to start over.

If you chose to install pdns-recursor, you can verify it’s working by looking for cache-hits

$ sudo rec_control get-all | grep cache
cache-entries 1090
cache-hits 17
cache-misses 281
negcache-entries 88
packetcache-entries 148
packetcache-hits 103
packetcache-misses 298

Step 7. Create a cron job for isbg

You’ve removed all the spam from your inbox, but that doesn’t prevent new spam from coming in.  So set isbg to run every 5 minutes with this cron job:

*/5 * * * * /opt/isbg/isbg.py --imaphost mail.paloaltonetworks.com --ssl --imapinbox INBOX --spaminbox "Junk E-Mail" --delete --expunge --noninteractive --noreport --nostats

There are a few new arguments here.  Notice that savepw and verbose are gone, and we’ve added noninteractivenoreport, and nostats.  These will supress output unless there is an error.

Don’t worry if isbg takes longer than 5 minutes to run.  If it’s still running when the next cron job triggers, it will silently quit and try again in 5 minutes.  As a result of this behavior you could tune this down to 3 minutes or 1 minute, it just depends how risky you want to get with your IT department.

That’s it!  Watch and enjoy as the spam is quickly whisked away from your inbox into the junk mail folder!

Resources and Acknowledgements

Thank you to all the folks who have researched spam and SpamAssassin. Thank you to Roger Binns for creating isbg, and to Thomas Lecavelier for maintaining it.