My email setup (geekery)
Note that this is probably much more fiddly and geeky than most people want to deal with, and it requires an email host which allows you to use IMAP, procmail, and custom filters (or at least specifically bogofilter).
Note that I don't specifically have anything against GMail, it's just that I like to be in control of my hosting setup and spam filtration and so on, and I've noticed that GMail's spam filter is a bit less-than-stellar. (Certainly it's better than pretty much every client-side filter, of course, but I've had much better luck with my setup.)
Getting the mail
IMAP is king. IMAP is based on sync rather than fetch. If you read a message on one system it gets marked read on the server and then that gets marked read on other clients as well. You don't have to worry about your hard drive or mobile device filling up because devices just cache the stuff you're actually looking at. IMAP is great. Yes POP sucks. IMAP doesn't suck. Some IMAP clients treat it like POP though, and that sucks. But most IMAP clients treat IMAP as IMAP. Use IMAP. Even if you switch to GMail, if you want to keep using a regular email client with GMail, use IMAP, not POP. Seriously.
Spam filtering
Classification
My email provider (Dreamhost) has SpamAssassin installed. SpamAssassin is okay at tagging email with various high-level characteristics (e.g. "this came from a server that's on a blacklist" or "this server said it has a different hostname than it actually does" or whatever), but it only looks at rules as single-dimensional scores, when really it's combinations of factors which should be used as an earmark. So I set SpamAssassin to just tag email but not to filter it. The tags are useful because they are still visible to bogofilter. (However, they're not strictly necessary, so if you don't have SpamAssassin don't worry too much.)The actual filtering happens with bogofilter. It is a tool which looks at a stream of text and, based on word frequencies compared to a database of prior word frequencies, decides whether an email is spam or not. It learns. It is nice. And, by having it on the mail server, you only have to train it once.
bogofilter itself doesn't actually decide what to do with the messages, though. Instead it just sets a header in the file which can then be seen by procmail, which takes a message and given a bunch of rules decides which folder to put it into. If your email provider lets you set up mail filters, it'll almost certainly have procmail. Here is my .procmailrc file:
MAILDIR=$HOME/Maildir/ PMDIR=$HOME/.procmail LOGFILE=$PMDIR/log-`date +%Y-%m` MAIL=$HOME/Maildir/ BOGOFILTER=$HOME/bin/bogofilter DEFAULT=$MAIL # Strip SpamAssassin crap out of the subject :0fw * ^Subject:.*\*\*\*SPAM\*\*\* | sed 's/Subject: \*\*\*SPAM\*\*\*/Subject:/' :0fw |$BOGOFILTER -e -p :0: * ^X-Bogosity: Spam .spam/ :0: * ^X-Bogosity: Unsure .review/
Obviously this requires there to be two folders, "spam" and "review" on your email account. Well, actually there needs to be a few more than that.
Training
The other two mailboxes are named "train-spam" and "train-notspam." Their functions should be pretty obvious. I occasionally check the "review" folder, and move messages into the correct folders accordingly. (Also the occasional spam will show up in my inbox. Just move it into "train-spam" and be done with it. It's also extremely unlikely that a legitiamte email will show up in the spam folder, but bogofilter is extremely conservative (which is why it has the "unsure" classification to begin with). It will only put something in the spam folder if it's absolutely sure (well, 99.5% certain by default).So, okay, when you've put messages into these directories, how do they get back into bogofilter? It's pretty simple, really... I have a script, "bogotrain," which runs every 20 minutes (via a cron job), or if you don't have cron access you can just run it manually when you've accumulated a bunch of messages:
#!/bin/bash
export bogofilter=~plaidfluff/bin/bogofilter
export procmail=/usr/bin/procmail
function train {
$bogofilter -e -vvv < "$1" &&
$bogofilter "$2" < "$1" &&
$procmail < "$1" &&
rm -f "$1"
}
find ~/Maildir/.train-notspam/{cur,new,tmp} -type f |
while read fname ; do
train "$fname" -n
done
find ~/Maildir/.train-spam/{cur,new,tmp} -type f |
while read fname ; do
train "$fname" -s
done
Message archival
Finally, I really like keeping a complete archive of my email for a long time, but of course just having it all pile up in my inbox becomes untenable. So, I have two more folders, "Read" and "Sent" - but with a twist, as IMAP allows folders to have subfolders. I have a script which runs once a month (also via cron) called "maildirs":#!/bin/sh
TARGET=`date -d 'last month' +%Y.%m`
cd $HOME/Maildir
for i in .Read .Sent ; do
[ -d $i ] && [ ! -d $i.$TARGET ] && mv $i $i.$TARGET
mkdir -p $i/cur $i/new $i/tmp
doneSearch is limited by your choice of email client. OSX's Mail.app does a great job of this. Thunderbird... not so much. Outlook is somewhere in between. If I need to do some heavy-duty searching and I'm not on a Mac I'll just ssh into the mail server and do some complex find/grep fiddling, or I'll ssh to my Mac at home and do mdfind or something. Okay, this is somewhere that GMail definitely wins.
Anyway. GMail is fine (especially with Google Apps for Domains, so it's not like you even have to be stuck with somelongusername@gmail.com or whatever), but you don't have to switch to it to get everything the way you want it.
Comments
I've had the idea for a while of using GMail to collect and send mail, but doing all the spam filtering and whatnot locally by pulling everything via IMAP. Never gotten around to implementing it, though.
(Also, you have a parse error in index.php.)
And argh, I can't figure out where that parse error is coming from. I hate you, php.
GMAIL_USER='your_gmail_username'
GMAIL_PASS='y0urgmai1pa$sw0rd'
import imaplib
class ImapChecker(object):
class ServerError(Exception): pass
def __init__(self, imap):
self._imap = imap
def __getattr__(self, attr):
innerFunc = getattr(self._imap, attr)
def _wrap(*args, **kwargs):
[result, retval] = innerFunc(*args, **kwargs)
if result != "OK":
raise ImapChecker.ServerError(result, retval)
return retval
return _wrap
imap = ImapChecker(imaplib.IMAP4_SSL(host='imap.gmail.com'))
imap.login(GMAIL_USER, GMAIL_PASS)
[count] = imap.select("[Gmail]/Spam")
count = int(count)
if not count:
print "No new spam! Hurrah!"
else:
print "You've got %d spam! Copying them to inbox..." % count
imap.copy("1:%d" % count, "INBOX")
print "Done!"