[Mimedefang] Fw: [Sare-users] Spam with numbers in subj and body

Jan-Pieter Cornet johnpc at xs4all.nl
Thu Jun 8 03:51:17 EDT 2006


On Wed, Jun 07, 2006 at 04:26:25PM -0700, Matthew.van.Eerde at hbinc.com wrote:
> or more generically:
> body _BODY_IS_JUST_DIGITS_AND_WHITESPACE /^[\s\d]+$/
> 
> or allowing newlines too:
> body _BODY_IS_JUST_DIGITS_AND_WHITESPACE /^[\s\d\r\n]+$/

Erm... since I'm playing regex police anyway: nope. First,
\s includes \r and \n, so the first rule already included newlines.
If you want to only allow whitespace you'd have to specify
[ \t] or \p{IsSpace} or [[:blank:]], except that the latter two
match 24 and 16 other characters, respectively, including UTF-8
characters like U+180e, the mongolian vowel separator, and
U+202f, the narrow no-break space.

Second... you effectively built "BODY_IS_JUST_DIGITS_OR_WHITESPACE"
                                                     ~~
Since [\s\d] will match either whitespace or digit, your rule will
also match lines containing only whitespace, and since whitespace
includes newline, it will match a "line" consisting only of "\n",
which appears in every email...

If you want to match a line containing digits and optionally
whitespace, use:

    /^ \s* \d [\s\d]* $/x

And last but not least: matching newlines on body rules is useless,
since spamassassin removes line breaks before matching body rules.
It might leave the last newline present, for each paragraph that
it matches. I haven't tested that. But the trailing "$" on a regex
matches at the end of string, or it matches before the final \n
at the end of the string, so there's no need to consider newlines
for body rules at all.

-- 
Jan-Pieter Cornet <johnpc at xs4all.nl>
!! Disc lamer: The addressee of this email is not the intended recipient. !!
!! This is only a test of the echelon and data retention systems. Please  !!
!! archive this message indefinitely to allow verification of the logs.   !!



More information about the MIMEDefang mailing list