[Mimedefang] Fw: [Sare-users] Spam with numbers in subj and body

Jan-Pieter Cornet johnpc at xs4all.nl
Wed Jun 7 09:57:30 EDT 2006


On Tue, Jun 06, 2006 at 11:04:00AM -0400, Kevin A. McGrail wrote:
> Hey now, that's a little harsh!  I'm proud to say I do write rules for a 
> "living" and hope to contribute more. 
> http://svn.apache.org/viewvc/spamassassin/trunk/CREDITS?root=Apache-SVN&view=markup

Oops. Well then let me help the community by giving you some
constructive criticism of your posted rules, thus allowing you to learn
from the experience, so to speak :)

> Don't make me get my Jan-Pieter voodoo doll out!
> 
> Anyway, the current rule I believe is actually worth using is this and more 
> revisions might be made at 
> http://www.peregrinehw.com/downloads/SpamAssassin/contrib/KAM.cf
> 
> #KAM NUMBER EMAILS
> header          __KAM_NUMBER1   Subject =~ /^\d+$/i

There are no case-sensitive numbers, not even in UTF8. So out of the
270 possible characters that you are matching with that \d, the /i
adds nothing. But that's not really important. At least you
fixed the old /\d*/ match, which matched on _every_ string
(because it matches the null string).

There's a regex rule: at the end of a rule, a wildcard with 0
minimum matches is _never_ useful, and can be left out completely;
a wildcard with a minimum of 1 match isn't useful either: just
leave out the wildcard itself (unless you are somehow using the
complete matched string, but in perl, you'd normally enclose the
thing in brackets in that case).

> body            __KAM_NUMBER2   /\d{1,6}/

Same here: a wildcard at the end isn't useful, your match isn't
anchored, and this matches everywhere /\d/ does. Maybe /^\d{1,6}$/m
could be useful. I suppose even better would be something like:

rawbody 	__RULENAME 	/^<html><body>\s+^\d+$/m

And if spamassassin didn't suck and split up the email the way it did,
you could even put about the entire body in the regex, for even better
accuracy, but unfortunately that won't work.

> meta            KAM_NUMBER      ((__KAM_NUMBER1 + __KAM_NUMBER2 + 
> MIME_HTML_ONLY + HTML_SHORT_LENGTH) >= 4)

Why don't you use __KAM_NUMBER1 && __KAM_NUMBER2 && MIME_HTML_ONLY
    && HTML_SHORT_LENGTH

? That seems to better explain what you mean here. Although this
does the same, but this is trickier to maintain. (If you add or
remove a rule you have to update the count. Also the brackets
aren't necessary).

Hope this helps!

-- 
Jan-Pieter Cornet <johnpc at xs4all.nl>
!! Disc lamer: The addressee of this email is not the intended recipient. !!
!! This is only a test of the echelon and data retention systems. Please  !!
!! archive this message indefinitely to allow verification of the logs.   !!



More information about the MIMEDefang mailing list