[Mimedefang] DNS Lookups in MD - Was RBL and DNS lookups

Tue May 15 09:45:21 EDT 2007

On 14 May 2007 at 15:50, Kevin A. McGrail wrote:

> >> The first is to check for valid MX records on the sender.  I use this to
> >> reject email and it works VERY well.
> >
> > What's your hit rate for this?  In particular, what's your hit rate for
> > messages that *only* this catches?
> 
> On my main server, I blocked 1904 emails out 16584 using Invalid MX Checks

OK, that doesn't answer my question.

How many of these would also have been blocked by something else pre-
SA?

> My policy on this is as follows:
> 
> This test is based on AOL's reverse pointer rules.  AOL uses this test to 
> outright block email.  We use it ONLY to score email.

If everybody else was jumping over a cliff, etc.  Just because AOL is 
doing something really dumb doesn't mean everybody else has to be.

> > Again, what are your stats on what is stopped *solely* by this check.
> > In other words, how many extra bad e-mails (as a percentage) would you
> > deliver if you did not have this check?
> 
> None. I use this rule only to score emails not to block them.  However, on 
> one server, I marked 1636 emails as missing a ptr and 3934 as suspect out of 
> 14960.  So the check affected over 37% of our traffic.  Even if you can use 
> it only to add/subtract even 1/2 of a point in the SA scoring algorithm, I 
> believe it is worth it.

Is it worth it for 0.01 point in SA?  What about 0.1?  In other words, 
how many messages have you rejected because of SA scoring that hit this
test *and* have a score between "reject" and "reject + 
score_for_missing_pointer"?  I run this analysis for every expensive 
test, and so far none filter more than 1% of bad e-mail that would not 
otherwise have already been filtered.

By my studies, unless you start scoring some of these things at 3-5 
points in SA, the number they push "over the edge" is so tiny that both 
the cycles and maintaining the code are far more work.

Here's my SA scoring breakdown:

Count     Score
4616       < 0
1290      0-2.9999
  71      3-3.9999
  38      4-4.9999
  39      5-5.9999
  28      6-7.9999        
  19      8-8.9999
  13      9-9.9999
  16     10-10.9999
  14     11-11.9999
  43     12-14.9999
 214     15-24.9999
 169     25+

I mark as spam at 5 and reject at 10.  You'll notice that very few fall 
into the ranges where a small scoring rule will tip them one way or the 
other for either of these.  By my count, out of 6570 e-mails that I 
scored, only 38+39+13+16=106 are in that range.  That's 1.6% of what 
gets past the various other filters (HELO syntax, greylist, etc).  
Those other filters stopped 11,738 messages without a single extra DNS 
lookup, so changing state on a mere 0.9% of all messages isn't worth 
the time.

> > For me, none of the reverse DNS checks stop enough extra to be worth
> > wasting DNS bandwidth (even with a caching DNS server).
> 
> I don't look at individual rules.  I look at the overall ecosystem.

That's one of the big fallacies of a lot of anti-spam thinking, at 
least as far as "expensive" tests go.  For something cheap (like small 
SA rules when you are already running SA), it's not a big deal to have 
a few things that aren't particularly effective, since it probably only 
costs you a few milliseconds per e-mail.

But, for expensive tests (and reverse DNS is very expensive in this 
case, since you tend to have to do uncached lookups for every new 
zombie machine), unless they are *very* accurate (i.e., no false 
postive/negative) and *very* indicative (i.e., can be assigned a high 
SA score or used to reject outright), they tend to be something that 
just won't scale well to large volumes of e-mail.

--
Jeff Rife |  
          | http://www.nabs.net/Cartoons/OverTheHedge/HighTech.gif