[Mimedefang] SpamAssassin regexp question

Jan Pieter Cornet johnpc at xs4all.nl
Thu Aug 7 11:03:04 EDT 2008


Sorry for the slow reply... but I had to react (omg! somebody
on the intarweb is r0ng!)

On Wed, Jul 30, 2008 at 11:59:57AM -0400, Joseph Brennan wrote:
>> # This mail was sent by...
>> rawbody UAH_SENTBY1 /^This email was sent by:$/
>> score UAH_SENTBY1 1.0
>> rawbody UAH_SENTBY2 /^Unsubscribe: http:.+\/accounts$/
>> score UAH_SENTBY2 1.0
>> 
>> When I ran command line spamassassin against a copy of the message, the
>> tests did not hit.  When I changed it to this:
>> 
>> # This mail was sent by...
>> rawbody UAH_SENTBY1 /\nThis email was sent by:\n/
>> score UAH_SENTBY1 1.0
>> rawbody UAH_SENTBY2 /\nUnsubscribe: http:.+\/accounts\n/
>> score UAH_SENTBY2 1.0
>> 
>> they did.  What's the difference?  Thanks...
> 
> The symbols ^ and $ do not always mean start and end of line.  They
> mean start and end of the chunk perl is working with, which can be
> changed by redefining $/.  SpamAssassin seems to read the whole
> message in one chunk, so that it can match across lines.

This is... an oversimplification.

The ^ and $ symbols match at the beginning and end (respectively)
of the string. When you are reading from a file handle, strings are
delimited based on the value of $/ ($INPUT_RECORD_SEPARATOR).

So far, you are correct. However, spamassassin presents the "body"
rules with the message _per_paragraph_ (see Mail::SpamAssassin::Conf,
search for "body SYMBOLIC_TEST_NAME). Line breaks and html tags
are removed. "rawbody" tests are presented per line.

That means, in case of "body" rules, "^" means beginning of PARAGRAPH
and $ means end of paragraph.

You can in theory add the //m switch to the regex, to make the ^
and $ symbols _ALSO_ match right after and before an end-of-line
character. However, this is useless for the "body" and "rawbody"
rules, since there will never be linebreaks in the middle of the
thing you're looking at. The only rules where that could be useful
is in "full" rules.

-- 
Jan-Pieter Cornet <johnpc at xs4all.nl>
!! Disclamer: The addressee of this email is not the intended recipient. !!
!! This is only a test of the echelon and data retention systems. Please !!
!! archive this message indefinitely to allow verification of the logs.  !!



More information about the MIMEDefang mailing list