[Mimedefang] early experiences with grey listing

Thu Jan 13 11:10:42 EST 2005

> From: Paul Murphy
> Sent: Thursday, January 13, 2005 2:06 AM
[...]
>
> The default in most implementations is one minute - the point of
> greylisting is that most spam mailers have several million addresses
> to send to, so even if they bother to check the return codes, most
> take no action on them, and skip to the next address. [...]

(background: we're discussing an implementation based upon
 http://www.bl.org/~jpk/md-greylist/, which contains elements of

http://lists.roaringpenguin.com/pipermail/mimedefang/2004-February/020126.ht
ml
 and http://whatever.frukt.org/mimedefang-filter.shtml.)

Paul, I understand that is the recommended way of doing things, and given
the risks of missing real non-spam mail, there are some good reasons
for not waiting an hour.  I chose this longer interval to give time
for the various SURBL's (see http://www.surbl.org) to register any
offending URL's that might be contained in the message.  Thus, it
amounts to a two pronged defense: (1) chasing off the spammers who
don't retry, and (2) increasing the reliability of the SURBL checks
when/if the spam comes in.  This seems to be working well. However, I
will point out that only two spam messages made it into my spam folder
last night, where there might've usually been 10 to 30 (we also implement
IP black lists which eliminates about 2/3's of the incoming spam right off
the bat). It seems, 90% of the spammers didn't retry. So, perhaps, given
the downside, it is best to shorten the initial grey list period,
and see how that works.

I'm now thinking there may be a hybrid strategy, where we shorten the
initial blackout period to 1 min., and then if the sender does retry,
we place it in a holding tank for 1 hour.  This would ensure ultimate
delivery at least, and give time for the SURBL's to populate.  The cost
would be a delay in delivery of some mail.  I don't know, however,
how this might be implemented within MIMEDefang without tieing up
a thread for each delayed mail processing event. Is there a method
for locally requeinng for re-delivery 1 hour later?

> [...]
> Not necessarily true - I've had issues with spam from domains
> which appear to be
> legitimate, and in large quantities.  Even though SA scores them
> between 8 and
> 25, and we bounce anything over 10, their persistence made it a
> nuisance, and I
> ended up blocking their IP addresses at the firewall level.
>

Yeah, we haven't gone with blocking at the router level, but would
add such spam sources to the access.db reject list.

Net, net - it looks as if a fairly comprehensive grey list implementation
is required to make sure that it is both effective, and that it doesn't
kill off legitimate mail. It seems to be risky to run with only an 80/20
type of implementation.

>
> It is already available using md_check_against_smtp_server() in
> your filter if
> that is supported (and note that Exchange 2000 is broken, so
> doesn't work),

Yeah some of the other implementations try to validate the
incoming sender, using md_check_against_smtp_server(), and
that may be a good idea.

In a related area, what I'm seeing in the current
implementation is that it doesn't seem to handle these sorts of
call backs well when they arrive on an incoming connection.  It
seems that mailers on the other side don't appreciate
having their probe delayed, even though they may be using it
to implement greylisting on their side. <g> However, it
is not a good idea to whitelist From <> outright, because it is
a favorite spammer sending address 99% of the time.

> you can also use LDAP to do real-time queries against AD or any other
> LDPA-compatible directory system.  Alternatively, some people
> harvest valid
> addresses from their systems into a local DB file daily, and
> check that from MD.

In general, it is difficult to do everything that sendmail might do
to validate a user.  It is configuration dependent, and some greylist
implementations I've looked at, actually go to a fair amount of work
to make virtual host and virtual user substitions, etc. Some harvest
the mail logs, and note the user at domain's that were actually delivered
and populate a database with that info, although I'm not sure
how that info. is put into use. We have delay_checks on, fyi.