This description was written on 16th December 2003, for local consumption. This means that some of the numbers are very out of date, and that some of the content is directed purely at Aber's setup.
We are using a modified form of greylisting which doesn't use the originating IP address. We found that including the IP address caused problems with ISPs who have multiple servers serving mail from a single queue (retries could come from a different IP address than the original try, so the greylist timeout could become unacceptably long). Using just sender and recipient addresses caused almost all the complaints to go away, and the algorithm still seems to be very effective.
Greylisting (http://projects.puremagic.com/greylisting/) is based on two things:
What we do with greylisting is to force external mail to go through a process which can sometimes delay it. This delay, whilst small to the average user, is large enough to discourage most spammers and to allow the international blacklists to get a hold on many of the others.
Let's assume that two people are trying to send me a mail (I've replaced my address with a non-existant address since I don't want to get spammed by having my mail address on an anti-spam page!). The first is a spammer, who is trying to send mail to me from "spammer@nasty.org" and is trying to do it from IP address 1.2.3.4. The second is someone I've never heard of who is trying to compliment my on my wonderful web site. Their address is "niceguy@coolpeople.com" and they are trying to connect from IP address 5.6.7.8.
When they first connect to our mail servers, both try to send a mail to auja99t@aber.ac.uk. In both cases, the mail servers fake a temporary failure. Temporary failures are used for e.g. when server load is too high to accept mail, when a user is over quota, that sort of thing. They should not result in the mail being bounced back to sender. Instead, the other end should queue the mail and try again later.
So, "spammer@nasty.org" mails auja99t@aber.ac.uk. The mail servers lookup the pair "spammer@nasty.org => auja99t@aber.ac.uk" in a database. They fail to find it and say "sorry we have a temporary problem". They also record that pair in the database along with the time. The chances are that the spammer doesn't bother to try again. If so, we've successfully blocked the spam. Great.
If the spammer is persistent and tries again, we look at how long ago they last tried. If it's less than an hour, we defer their message again. If it's more than an hour, we accept the message and mark the pair "spammer@nasty.org => auja99t@aber.ac.uk" as OK and, from then on, we will always accept that mail as OK with no deferral. However... this first mail has been delayed for an hour. There's a pretty good chance that the spammer is now in a blacklist somewhere and so our normal spam filtering rules will spot them and, if the recipient has opted, reject the mail.
All well and good. The spammer gets blocked. Now what about Mr. nice guy?
Well, the same thing happens. The first mail from niceguy@coolpeople.com is deferred. Since he's not a spammer and is coming via a proper mail server or mail client, his mail will definitely be queued for a later attempt. If that later attempt happens within an hour, it'll be deferred again; if it's more than an hour later it gets let through and "niceguy@coolpeople.com => auja99t@aber.ac.uk" is remembered as OK. All future mail from him will get through without delay.
If the spammer is using a "proper" mail server rather than specially written spam software, then their mail will be queued after the initial attempt. It will get through the greylist after an hour. Hopefully the naughty server has been blacklisted by then. If not, it gets through. That's why we're still receiving spam - companies which sell spamming services and insecurely set up computers on ADSL lines.
There are only a few things that can go wrong with this:
N.B. All of these can be bypassed by the intended recipient. If the recipient sends a mail to the person they want to hear from, the whole greylisting system will be bypassed for mail back from their address *to the recipient*.
First off, it's not optional. Greylisting happens at a very low level within the protocol and it's not trivial to opt people out of it. It's not directly integrated with the spam scanning but is a completely separate beast. I could make it more integrated but it would be quite hard to set up and, given that, in theory, it "blocks" nothing, I'd be quite reluctant to do it.
Second, we've been running greylisting in its current form since 16th September. 3 months exactly. Each day around 25,000 externally sourced messages get past the greylist. So, during that time we're talking around 2.25 million mails getting through. During that time I've received under 10 queries about the greylist. If you're receiving queries, PLEASE pass them on to me. Without feedback I can't tell how it's performing. At the moment, I feel that it's performing ludicrously well (query rate seems to be under 0.0004%!).
Finally, I said that 25,000 mails get past the greylist each day. That's out of around 42,000 attempted mails. I reckon that around 60% of all externally sourced mail doesn't come back for another try. We don't receive 27,000 complaints each day so it's a reasonable assumption that it's all spam! I also reckon that the amount of spam being spotted by the spam filters proper has dropped by around 80% since we put this in. The CPU time previously being done in scanning ginormous amounts of messages is now available for better scanning methods on the smallish amount of spam that gets through. We're now able to allow the system to act automatically on spam reported by our users - since the most damage an incorrect or malicious report can do is impose an hour's delay on the next mails from that sender.
The information provided on this and other pages by me, Alun Jones is under my own personal responsibility and not that of Aberystwyth University. Similarly, any opinions expressed are my own and are in no way to be taken as those of the University.
|
My GPG public key is here This page is located at: http://users.aber.ac.uk/auj/spam/greydesc.shtml It was last modified on Mon Jun 21 09:46:48 2004 It has been accessed |
|