[Mimedefang] getting it right about how the filter works...

David F. Skoll dfs at roaringpenguin.com
Wed Nov 28 09:39:40 EST 2001


On Wed, 28 Nov 2001, Tony Nugent wrote:

> First question: at what point is it "safe" (or unsafe) to access the
> variables declared at the top of mimedefang.pl?

The mimedefang-filter(5) man page has a section starting with the
sentence:

"In addition, the following global variables are available:"

The variables listed in that section are valid in filter_begin, filter,
and filter_end.

Any other global variables may or may not be; I make no guarantees.  Read
the source to know for sure. :-)

>    I'd like to be able to easily sort them chronologically by name.
>    Is there any reason why a sequential timestamp mechanism not be
>    used to generate these names, instead of random numbers?

That could be done, I suppose.  I could base it on the time.  Please
vote for this change off-list, and if the consensus is that it's a good
idea, I'll make the change.

>    I'd also like to be able to use the name of the quarantine
>    directory as a reference string for the "incident", making them
>    easy to find if they need to be retrieved.  I haven't spotted the
>    global variable that identifies this generated name may be - and
>    I assume that it is only available filter_end() or in filter() in
>    passes _after_ an action_quarantine() has been called.

Right; the name gets generated the first time action_quarantine is called.
I can change this too.

> I am under the impession that filter_begin() is called once for each
> message filtered by mimedefang.pl.  This is where things that remain
> "consistant" for a message.

Right.

>   So this is where the general "global" properties of the email
>   being processed (body size, recipient, sender, relay host etc) can
>   be examined and acted upon right away.  Here you could perhaps
>   call action_discard() if message_contains_virus() returned true -
>   and I assume that the message is discarded right away without
>   further filter() processing.

Actually, even if you call action_discard() in filter_begin(), filter() is
still called for each part.  You might (for example) want to discard a
message, but still quarantine some parts.

>   Also, filter_begin() sppears to be a good place to set global
>   variables based on tests done only once on its overall message
>   properties.  These variables can then be used as flags for
>   determining actions taken in filter() or filter_end().

Be careful... see below...

> I expected so, or so my theory went.  But there was some unexpected
> weirdnes that happened when I took this approach...

>    I was experimenting with gathering all the Recieved: headers into
>    a format that I wanted to re-parse and do things with later.
>    However, I ended up with much more than I expected - other
>    headers from other messages that happened to be going through the
>    mail server at about the same time.

In multiplexor mode, the Perl process runs in a loop.  You MUST explicitly
reset ALL of your global variables in filter_begin.  Otherwise, results
for messages will accumulate, as you observed.

>    Until, that is, I started setting a global $DISCARDME in
>    filter_begein() that was acted on later.  That global was set for
>    one test message I put throughit, then for a couple of other
>    messages that went through the server around the same time... all
>    were discarded (luckily after being quarantined).  Oops, not the
>    way to do it :)

You MUST reset $DISCARDME to zero every time.  Something like this:

sub filter_begin {
	# Init global variables
	$DISCARDME = 0;

	#...

	if (some_test_passses()) {
		$DISCARDME = 1;
        }
	# ...
}

> Has this quirk got anything to do with how the multiplexor works,
> keeping processes continuously running, perhaps multiple instances
> sharing the same environment?  If so, how is it possible to cope
> with this?

Just reset ALL your variables to safe or empty values first thing
in filter_begin().

>   What is really going on here?  Are multiple instances of the
>   filter really sharing the same local environment?

Here's the pseudo-code for multiplexor-mode:

foreach incoming_mail_message {
	internal_reset_global_vars();
	filter_begin();
	foreach part of message {
		filter(part);
        }
	filter_end();
	take_action();
}

Here, "internal_reset_global_vars" resets all the internal MIMEDefang
variables.  filter_begin is responsible for initializing any global
variables you want to use.  filter_end can do cleanup if necessary.
take_action() examines global variables to see what action_* methods
were called, and communicates back to Sendmail.

>   Would this be a better place to, for example, do things like
>   action_discard() - especially if you want to do all the filter()
>   checks, set things like a global $discard variable or whatever,
>   then finally check for it and act accordingly at the end.

That's one possibility.

> BTW David, you mentioned that you had a problem to solve that
> involved removing attachment, replacing them with a reference to a
> URL, then putting the attachment somewhere for web access at that
> URL.  How far did you get with this?

Not done yet; should be in 2.2.

Regards,

David.




More information about the MIMEDefang mailing list