[Mimedefang] MimeDefang, SA, and Graylisting.

Tue Jul 26 20:14:56 EDT 2005

I really like the idea of graylisting, but I'm a bit worried
about 4 hour delays from machines with slow queue runners.

So here's my theory:

Everything scored by SA over a certain threshold (set per 
domain on my box, but most of them are around 6 or 7) gets 
rejected out of hand.
Everything under a 0 is probably not spam.

I just want to graylist the middle stuff, very gently.

I see two tables:  One would contain fields for IP, sender
addy, and recipient addy, and the last time it was tried. Call
this the "tempFailed" table.  
The other would contain an IP and a time - this would be the
"retriedSuccessfully" table.

My theoretical filter_end would look like:

If it's a virus, reject it.
Call SA.
If it's going to be rejected, reject it.
If it's less then 0, deliver it.
Otherwise, grab the IP of the machine connecting.
Look it up in the retriedSuccessfully table.
    If it's there, accept it, and update the time.
    It it's not, look it up in the tempFailed table.
        If it's there, accept it, and add it to the retriedSuccessfully
            table.
        If not, add it to the tempFailed table, and send a 451 to the
            server.

I'd also need a cron job to expire the databases.  I was thinking
anything in tempFailed over a day old could be tossed, and the
retriedSuccessfully table would hang around for a month or so 
after the last good mail.

This seems like a very lightweight implementation.  I'm a little
concerned about corner cases - what about server farms, where
mx1, mx2...mx125 all send out data?  The graylist whitepaper 
mentions this, but doesn't really have a very good solution.  
Could I leave out the IP on the tempFailed table and only look
up the sender/recipient pair to get around this?  

I'm planning to use mySQL, because, well, I have it.  I don't
see all that much email (peak for me is 100 messages an hour,
average load is 30 an hour), so it shouldn't give me a huge
performance hit, since only about 1/4 of that mail would even
hit the first table lookup.  I know a reasonable amount about
designing not-horribly-inefficient queries.

I already have good monitoring in place, and I'd design it
so that it could be enabled/disabled on a per-domain basis.

How expensive is a mySQL connection?  I don't have a way to
keep them around between emails, so I'm going to have to open/close
one every mail.  I don't stream_by_recipient, but...

What do people think of the idea in general?  If I implement 
it, I'll put it on the wiki.  

Tina Marie
-- 
http://www.tripacerdriver.com               "...One of the main causes 
of the fall of the Roman Empire was that, lacking zero, they had no way
to indicate successful termination of their C programs." (Robert Firth)