[Mimedefang] Comments on mimedefang.plL

Clayton, Nik [IT] nik.clayton at citigroup.com
Mon May 27 10:07:20 EDT 2002


Hi all,

I'm currently testing the MimeDefang + SpamAssassin combination in a
somewhat,
er, challenging environment (230,000+ users, 1.5 million messages a day, 
somewhere in the region of 150GB of mail sent back and forth, . . .).  As
you
can imagine, performance and security issues are right at the top of my 
agenda :-)

First off, MIMEDefang + SpamAssassin -- very sweet indeed.  Much kudos to
the
authors.  Having said that. . .

. . . I've been chewing through the mimedefang.pl source code, and I've got
a
few comments which I thought might be of general interest.  Assuming I can 
get $employer to agree, patches might be forthcoming.

First, a couple of bugs.

In sendmail() and resend_message(), the exec() call can fail.  If this 
happens the child doesn't explicitly exit, and bad things will happen.

>From an efficiency point of view, the code in serverloop() that parses the
SENDER, HOSTIP, HOSTNAME, RECIPIENTS, and other files isn't always
necessary.
It's there to ensure that some global variables are populated properly.
However, if the end user's filter*() procedures don't use these variables,
this work is for nothing.  IMHO, these variables ($Sender, et al) could 
either be wrapped inside subs that the filter has to call to get the values,
or (neater) turned in to tie()'d variables.  Either way, the performance
hit is only taken if you need the value.

A few lines down from there is some code that looks like this:

    if(open(IN, "<SUSPICIOUS-CHARS-IN-HEADERS")) {
      close(IN);
      $SuspiciousCharsInHeaders = 1;
      syslog('info', ...);
    }

and a similar block for SUSPICIOUS-CHARS-IN-BODY.  TIMTOWDI, and all that,
but how about

    -f 'SUSPICIOUS-CHARS-IN-HEADERS' && $SuspiciousCharsInHeaders = 1;
    -f 'SUSPICIOUS-CHARS-IN-BODY' && $SuspiciousCharsInBody = 1;

instead?

Is there any interest in allowing filter_begin() to abort processing the
rest of the message?  If I've used SpamAssassin to determine that the 
message is spam in filter_begin(), I don't want filter() to be called for
the
rest of the parts in the message.

Now for a more general muse -- how do people feel about the 'interface' that
the filter*() procedures have to the rest of the code?

To be honest, it scares me.  A badly written filter*() sub can stomp over
other global variables with impunity, and there's no 'sanctioned' way to 
save state between calls to filter_begin(), multiple calls to filter(), and
calls to filter_end(), short of creating a new global variable, and hoping
that future releases of mimedefang.pl don't decide to use the same variable
name for something else.

How about putting the user supplied filter subs in their own package, so 
they've got their own namespace to play with?

Of course, once you've gone down that route, the object-oriented path 
beckons seductively (sorry, I'm a bit of an OO enthusiast).

I could envisage writing a filter that looks something like this (note that
this is just an idea -- I haven't written any of the code that would 
actually implement any of this functionality (yet!), and the API that 
I outline below is all up for change based on feedback on this list).

[ This is the code that the mail admin writes, instead of the current
  filter file ]

    package MimeDefang::Filter::Local;

    @ISA = qw(MimeDefang::Filter::_base);         # Inherit from base class

    sub init {
        # Empty, but some filters might want to do something clever here
    }

    sub begin {                      # begin?  Too close to BEGIN?
        my $self = shift;

        # Some things I might do with myself (as it were), instead of
        # using the current functions.  Ordinarily, of course, you
        # wouldn't call all these functions from one filter sub.  I've
        # just included them here to give a flavour of the API I'm
        # veering towards.

        # This assumes that $self->msg() returns a reference to a 
        # message object, which has functions for obtaining objects for
        # the envelope, header, and body

        # Note that in real life we'd probably store the results from
        # calls like $self->msg() and $self->msg()->envelope() in
        # variables to avoid the multiple subroutine calls.

        # Returns a ref to an array of recipients
        $self->msg()->envelope()->recipients();

        # Adds a new recipient
        $self->msg()->envelope()->add_recipient('foo at example.com');

        # Removes a recipient
        $self->msg()->envelope()->remove_recipient('bar at example.com');

        # Returns a ref to an hash of headers
        my $headers = $self->msg()->headers();

        # Adds a new header (might replace an existing header with the
        # same name
        $self->msg()->header('X-Foo', 'This is a foo');

        # Deletes an existing header (assuming that 'undef' isn't a valid
        # value for a header
        $self->msg()->header('X-Bar', undef);

        # XXX I'm not happy with the API in the previous two examples.
        # It doesn't work when you have multiple header lines with the 
        # same name (i.e., 'Received:'). . .

        # Gets an array of references to the MIME::Entity objects that make
        # up the message
        my $entities = $self->msg()->entities();

        # Iterates over the entities one by one, doing something interesting
        # This would be an alternative to having MimeDefang make multiple
        # calls to the filter, one per entity.  In fact, it might be better
        # to only have one entry point in to the filter -- if the end user
        # wants to iterate over the message's entities they can do it by 
        # hand using code like this.  Just a thought.
        foreach (@{$entities}) {
            $self->do_something_interesting($_);
        }

        # Create a new MIME::Entity, and add it to the end of the message
        my $e = new MIME::Entity;
        ...                             # Handwave the initialisation
        $self->msg()->entitities()->add_entity($e);

        # Remove the fourth MIME::Entity from the message
        $self->msg()->entitites()->remove_entity(3);

        # Replace the third entity with the one we created earlier
        $self->msg()->entities()->replace_entity(2, $e);

        # Of course, the previous few examples could have been done in 
        # 'sub filter' instead (with a few changes to keep track of the
current
        # entity number

        # Now decide what to do with this message.  Instead of using
strings,
        # use functions that return constants, so that the underlying
constants
        # (whether they're strings or numbers is immaterial) are hidden from
        # the implementation

        $self->action($self->accept());
        $self->action($self->reject());       # or perhaps this instead?
        ...                                   # insert the other actions
here
    }

    sub do_something_interesting {
        my $self = shift;
        my $entity = shift;

        # Do something interesting here
    }

The code in the '_base' filter (i.e., the bit that's shipped with
MimeDefang)
might look something like:

    package MimeDefang::Filter::_base;

    sub new {
        ... # Create the object in the usual way     

        bless $self, $package;

        $self->init();        # Initialise the object

        return $self;
    }

    sub init {
        # Does nothing, but anything that inherits from this class might
        # want to do their own initialisation here
    }

    sub accept { return 'accept'; }      # Acceptance codes
    sub reject { return 'reject'; }
    ...  # Duplicate as necessary

    sub action {
        my $self = shift;
        my $val  = shift;
        $self->{'action'} = $val if defined $val;
        return $self->{'action'};
    }

    # and so on and so forth.

Then, in mimedefang.pl, you have code that looks something like

    require '/path/to/users/filter.pm';
    my $filter = MimeDefang::Filter::Local->new(arg1, arg2, arg3);

    $filter->begin();              # Start processing

    # Check to see if we need to bail out early because we're going
    # to reject the message.
    if($filter->action() eq $filter->reject()) {
        # Bail out.  Notice that we're not doing a compare like
        # "$action eq 'reject'", so if in the future the string
        # changes to something else, we only need to change it in
        # MimeDefang::Filter::_base, and nowhere else
    }

Hopefully that makes some kind of sense.

Of course, I could just go off and implement a private version of 
mimedefang.pl.  But then no one benefits from the open source development
model.

Hope that's food for thought.

N
-- 
1234567890123456789012345678901234567890123456789012345678901234567890123456
7890
  -- The 80 column-ometer

Global Messaging      Victoria Plaza, 2nd Floor     x21206



More information about the MIMEDefang mailing list