[Mimedefang] Final - Using a db for subject lines to block

Cormack, Ken Ken.Cormack at roadway.com
Mon Jun 27 14:53:52 EDT 2005


Group,

I tried a few additions to the rule which I posted last week.  I was pleased
to see that the potential performance hit was nearly undetectable, compared
to recent days prior to the rule.  The comment block near the bottom of the
code explains what was tried, that was different from last week's version of
the function.

In addition, I've added another space-compression line, and re-ordered some
of the steps performed on $lc_subject, at the top of the function, to better
optimize $lc_subject for comparison to the database records.  This is
surprisingly fast for my database of 2200 records - with MD reporting total
"Filter time is" in the area of 1 to 4 seconds per email, depending upon the
email and what needs to be done to it (including the launching of an
external virus scanner).  Compared to last week's times prior to the rule I
cant really see a clear time penalty.

The database contains records that look like this...

	free.stuff	REJECT
	home.loans	REJECT
	best.business.you.can.find	REJECT

And so on.

The function is called, from filter_begin like this...

    if (lookup_subject()) {
        action_bounce("Access denied. Subject \"$Subject\" suggests MSG may
contain SPAM/WORM/VIRUS/HOAX.", "553", "5.7.1");
        return action_discard();
    }

And here is the actual completed function in it's entirety.  See the
comments within.  It's working quite well on my two servers.

#############################
# Search the subject-line database for subject lines/keywords to block
#############################
$DBFilenameSUBS = "/etc/mail/subjects.db";
sub lookup_subject() {
    # convert incoming subject to lower-case
    my $lc_subject = lc($Subject);
    my $subject_result = 0;

    my %GDB;
    if (tie(%GDB,'DB_File', $DBFilenameSUBS, O_RDONLY)) {
        # remove white space from the middle so that
        # "free s t    u f f here" becomes "free s t u f f here"
        $lc_subject =~ s/(\s)\s+/$1/g;
        # next 2 lines collapse "free  s t u f f  here" into "free stuff
here"
        $lc_subject =~ s!((^|\s)\S\s(\S(\s|$)){2,})!
            my $lc_subject_x=$1;$lc_subject_x=~s/\s//g;sprintf "%s","
$lc_subject_x ";!ego;
        $lc_subject =~ s/^\s+//;  # Trim leading whitespace
        $lc_subject =~ s/\s+$//;  # Trim trailing whitespace
        $lc_subject =~ s/^re://;  # Trim leading "re:"
        $lc_subject =~ s/^fw://;  # Trim leading "fw:"
        $lc_subject =~ s/^fwd://; # Trim leading "fwd:"
        $lc_subject =~ s/\s+/./g; # Collapse whitespace into periods

        # Scan database for a complete match (only)
        if ($GDB{$lc_subject}) {
            $subject_result = 1;
            md_graphdefang_log("Subject_Line", "Subject-line found in
subjects.db");
        } else {
            # See if any one word in the subject appears as a record
            @subject_array = split (/\./, $lc_subject);
            foreach $subject_word (@subject_array)
            {
                if ($GDB{$subject_word}) {
                    $subject_result = 1;
                    md_graphdefang_log("Subject_Word",
                        "Subject-word \"$subject_word\" found in
subjects.db");
                    last;
                }
            }
        }
        if (!$subject_result)
        {
            # here we reverse the logic... see if any record in the database
            # is found as a substring in the subject.  if a record contains
            # "free.stuff" and the subject says "get your free stuff here",
            # then flag it as a hit.
            my $subject_record;
            foreach $subject_record (keys %GDB)
            {
                if ($lc_subject =~ m/(^|\.)\Q$subject_record\E($|\.)/)
                {
                    $subject_result = 1;
                    md_graphdefang_log("Subject_Substring",
                        "Subject-substring \"$subject_record\" found in
subject line");
                    last;
                }
            }
        }
        untie %GDB;
    } else {
        md_syslog('warning', "subject: Cannot open file $DBFilenameSUBS");
    }
    return $subject_result;
}
#############################



More information about the MIMEDefang mailing list