[Mimedefang] greylisting with stateful learning

Tue Dec 23 01:51:20 EST 2003

Long ago I rambled on about stateful blocking of email, where the mail
server would learn and respond dynamically to the remote mail servers.
e.g. Greylisting, but where the delivery time is modified by:
Email Header Forgery,
Sending Virus,
Rapid relaying through multiple mail server.
Account enumeration.
High spam score, but not high enough to reject.
(Content analysis required and rejection in filter_end, so it's not
strictly greylisting.)

I have improved my previous greylisting code to include stateful greylisting.
It punishes ip or sender's that any of the behaviors listed above.
In all cases mail is never delayed more then a set maximum amount of time,
no matter how bad the relay address or sender-recipient pair address is.
(Read previous posts if you arent' sure what I am talking about.)

So for example on the initial attempt mail is delayed for 4 minutes.
Then if another attempt is made to deliver it, the delay time is doubled.
In no case will the delay time from the initial delivery attempt exceed 15
minutes.
Normal mail servers should attempt mail delivery no more then once every 5
minutes, and should back off with failed deliveries.

I am still determine my general heuristics for generating the delay time.

You can use this method to also protect against denial of service.
If a single mail server is sending you large amounts of spam or virus's it
should at some point kick in a throttle on the relaying server.

I'm still thinking these items through.
We want an automatic method that will not reject valid mail.

As always, you never want to delay mail for long enough time that the end
user receives a temporary delivery error.

The difference with stateful greylisting, is you can extract much more
information then just examining a single email message.

Their are a number of other items you could examine to implement stateful
filtering.

Is the remote machine relay dns resolvable?
Does it have a really long name?
Does it have it's ip address in ip-ip-ip-ip format as the hostname?
Is it sending the mail at night?
Does it end it a .biz or other businessy address?
(These are worth .5 to 1.0 points.)
how many mail messages has the remote machine sent you?
How often does it send them?
Is it sending mail with the same sender/recipitn through multiple relays
that are not in the same domain?

Does it re-attempt an immediate delivery after getting a temporary
delivery error?

I will continue to think about these items, and post my code after I have
been using it on my production server for a week or so.

I have also been looking at Jona's implementation and adding his
implementation to my current implmentation, where his implementation is an
improvement.

I am still determining my actual rules to use in determing delay.

Some general ideas for stateful greylisting of a virus mail machine?

Is the machine a hypernode? (Does it delivery a lot of mail, it appears to
be a mail hub like mx.aol.com or similar.)
No.
Has it attempted delivery of a virus message.
Yes.
Has it attempted multiple delivery attempts with the same sender/recipient
pair in a specific time period.
Yes.

How long to delay?
Ideas, on a virus delivery, give a 2 minute delay, in which you reject
mail from the machine, added to the inital delay.
Where n is the number of delivery attempts, add n^2 delay in minutes for
each virus delivery.
1st delivery attempt 2min
2nd:4m
3rd:9m
This is for just the ip relay.
Now my maximum time is set to 15 minutes.

So on a match with the same sender-ip address, reject mail from that
sender-ip, but accept mail from that ip address, so we do no reject valid
mail.

On virus deliveries we match on sender-ip-recipient pairs also.
On first delivery=block for
1st: 15m.
2nd: 30m.
3rd: 45m

This is doing a match on ip-sender-sender-recipient.
The virus-ip-sender-recipient block should completelly expire in 12-24
hours. As is it protecing against DOS.

I determinied this examing the behavior of remote relays, in 90% of they
only attempt multiple deliveries over a 1 hour period.
In most cases they only attempt deliveries for a 15 minute period.

I need to determine some statistical behavior patterns for spammer/virus
deliveries to determine the best throttling threshold to catch the most
with the least collatoral damag.

As always the more statistical information you can exctrac from normal
mail versus spam mail the more points you can kick into sa or the more you
can penalize spammy mail with greylisting.
A lot of my rejection periods are arbitrary and could be improved with
analysis of normal mail load, via mail logs.
--Luke