[Mimedefang] Using a db for subject lines to block

Cormack, Ken Ken.Cormack at roadway.com
Mon Jun 20 17:05:52 EDT 2005


Thanks, David - again, good info.

Ken

-----Original Message-----
From: David F. Skoll [mailto:dfs at roaringpenguin.com] 
Sent: Monday, June 20, 2005 4:22 PM
To: mimedefang at lists.roaringpenguin.com
Subject: Re: [Mimedefang] Using a db for subject lines to block


Cormack, Ken wrote:

> Can anyone see any problems with the code below?  Just logging, it appears
> to be working pretty well.

You may want to make your subject canonicalization a little smarter,
like:

	$lc_subject = s/^\s+//;  # Trim leading whitespace
	$lc_subject = s/\s+$//;  # Trim trailing whitespace
	$lc_subject = s/\s+/./g; # Collapse whitespace into periods

The third regexp will collapse multiple runs of spaces, so:

          really               cheap         mortgages

gets collapsed into

	  really.cheap.mortgates

You might (or might not?) want to delete other non-letter characters.

>             # scan database for each word in the subject

I wonder if you want to remember repeated words?  Otherwise something
like "a a a a a a a a a a a a a a a" can make you do an awful lot of
DB lookups.  Probably not a big deal in practice.

Regards,

David.
_______________________________________________
Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list
MIMEDefang at lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang



More information about the MIMEDefang mailing list