[Mimedefang] Pre-Emptive Greylist entries

Thu Jan 12 10:11:23 EST 2006

> 
> If I have the time, I'll give my suggestions regarding the use
> of SPF and RDNS a shot, and report back on the results.  My hunch
> is that they'll offer decent improvements, especially in handling
> first time senders.   Better, perhaps I'll process the message logs
> and give some feedback on how this approach might fare.
> 

I ran some tests using our mail logs as input data.  After reviewing
the results and the process that I used, I think I can get better
and more useful results by changing the analysis program, but that will
be more work and will have to wait for another day.

My program tracked 'grey_new' entries, where we record the first time
we see a (sender_from_address, recipient_address, helo_ip/24) triple.
I also extracted the helo_address (eg. foo at example.com) from a
related log entry.  Given this triple, I can look it up in the SQL
data base to see whether we ultimately accepted mail from that
tempfail, or whether it never returned after being tempfailed.
At the moment, we don't recycle the triples in our database,
and it is rather large (about 350,000 entries).  I've wanted
to run some programs on the data to come up with a blacklist
of particular offensive ISP's, so have not recycled old entries.

Given the sender/recipient/helo_address triples above, I noted
whether the helo_address appeared to be forged (based upon
sendmail's determination), and using Mail::SPF::Query I noted
whether this sender_address/helo_address pair registered as
as a 'pass', or anything else.  With this data in hand, I looked
at messages that either (1) received a 'pass' from SPF, or
(2) were not forged and the sender's from address matched the
domain part of helo address.

Of a total of 38215 greylist new entries (first time, tempfail),
1631 met the SPF/sender address criteria.  Of those 1631 entries
682 entries were ultimately accepted for delivery, thus there was
no harm in white listing them early using the heuristic.  931 would
have been 'false positives' in that we would have accepted them
early using the heuristic, when in fact they never retried after the
tempfail in the old scheme.

The heuristic would've accepted 4.2% newly seen sender/recipient
triples, with a roughly 60/40 split of 'false positives' to messages
that would ultimately have been white listed anyway.  Note that of
the 'false positives' not all of them were necessarily spammers.
Some of them might have been legitimate senders using poorly configured
software.  In any event, this technique at worst adds only 2.5% more
entries which are delivered and which must subsequently be processed
using the access list and content filters.

I hand inspected a few entries accepted by the technique above,
for early bypass of the greylist mechanism.  The heuristic did a good
job of letting through legitimate first timers, which of course is
the point of going to all the trouble to make these extra checks.

Overall, I'd say this heuristic using SPF and simple analysis of
the sender address and helo address has promise in improving the
system's ability to let legitimate first time senders through
immediately.  Perhaps by also validating the helo address as a
valid mx for the sender address, or noting that it is in the same
/24 as the sender, the heuristic can be improved further.

   - Gary