[Mimedefang] Using a db for subject lines to block
David F. Skoll
dfs at roaringpenguin.com
Mon Jun 20 16:22:15 EDT 2005
Cormack, Ken wrote:
> Can anyone see any problems with the code below? Just logging, it appears
> to be working pretty well.
You may want to make your subject canonicalization a little smarter,
like:
$lc_subject = s/^\s+//; # Trim leading whitespace
$lc_subject = s/\s+$//; # Trim trailing whitespace
$lc_subject = s/\s+/./g; # Collapse whitespace into periods
The third regexp will collapse multiple runs of spaces, so:
really cheap mortgages
gets collapsed into
really.cheap.mortgates
You might (or might not?) want to delete other non-letter characters.
> # scan database for each word in the subject
I wonder if you want to remember repeated words? Otherwise something
like "a a a a a a a a a a a a a a a" can make you do an awful lot of
DB lookups. Probably not a big deal in practice.
Regards,
David.
More information about the MIMEDefang
mailing list