[Mimedefang] Using a db for subject lines to block
Cormack, Ken
Ken.Cormack at roadway.com
Mon Jun 20 17:05:52 EDT 2005
Thanks, David - again, good info.
Ken
-----Original Message-----
From: David F. Skoll [mailto:dfs at roaringpenguin.com]
Sent: Monday, June 20, 2005 4:22 PM
To: mimedefang at lists.roaringpenguin.com
Subject: Re: [Mimedefang] Using a db for subject lines to block
Cormack, Ken wrote:
> Can anyone see any problems with the code below? Just logging, it appears
> to be working pretty well.
You may want to make your subject canonicalization a little smarter,
like:
$lc_subject = s/^\s+//; # Trim leading whitespace
$lc_subject = s/\s+$//; # Trim trailing whitespace
$lc_subject = s/\s+/./g; # Collapse whitespace into periods
The third regexp will collapse multiple runs of spaces, so:
really cheap mortgages
gets collapsed into
really.cheap.mortgates
You might (or might not?) want to delete other non-letter characters.
> # scan database for each word in the subject
I wonder if you want to remember repeated words? Otherwise something
like "a a a a a a a a a a a a a a a" can make you do an awful lot of
DB lookups. Probably not a big deal in practice.
Regards,
David.
_______________________________________________
Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list
MIMEDefang at lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
More information about the MIMEDefang
mailing list