[Mimedefang] Using a db for subject lines to block

David F. Skoll dfs at roaringpenguin.com
Mon Jun 20 16:22:15 EDT 2005


Cormack, Ken wrote:

> Can anyone see any problems with the code below?  Just logging, it appears
> to be working pretty well.

You may want to make your subject canonicalization a little smarter,
like:

	$lc_subject = s/^\s+//;  # Trim leading whitespace
	$lc_subject = s/\s+$//;  # Trim trailing whitespace
	$lc_subject = s/\s+/./g; # Collapse whitespace into periods

The third regexp will collapse multiple runs of spaces, so:

          really               cheap         mortgages

gets collapsed into

	  really.cheap.mortgates

You might (or might not?) want to delete other non-letter characters.

>             # scan database for each word in the subject

I wonder if you want to remember repeated words?  Otherwise something
like "a a a a a a a a a a a a a a a" can make you do an awful lot of
DB lookups.  Probably not a big deal in practice.

Regards,

David.



More information about the MIMEDefang mailing list