[Mimedefang] getting it right about how the filter works...

Wed Nov 28 09:23:47 EST 2001

---------- Forwarded message ----------
Date: Thu, 29 Nov 2001 00:11:36 +1000
From: Tony Nugent <tony at linuxworks.com.au>

  (I hope this makes it to the list, I haven't seen anything since
  the administrative notices)

I've been experimenting with some filtering rules in
mimedefang-filter (on the mail server late at night when nobody was
looking :) and I'm discovering some things that I'd like to have
confirmed.

First question: at what point is it "safe" (or unsafe) to access the
variables declared at the top of mimedefang.pl?

Second: regarding the names generated for the quarantine
directories...

   I'd like to be able to easily sort them chronologically by name.
   Is there any reason why a sequential timestamp mechanism not be
   used to generate these names, instead of random numbers?

   I'd also like to be able to use the name of the quarantine
   directory as a reference string for the "incident", making them
   easy to find if they need to be retrieved.  I haven't spotted the
   global variable that identifies this generated name may be - and
   I assume that it is only available filter_end() or in filter() in
   passes _after_ an action_quarantine() has been called.

   So my idea is that it would be handy if the name of any possible
   quarantine directory be available in advance, be numbered
   squentially as a timestamp, and that it could also be useable as
   a sort of defang reference string for the message itself.

Next one is more involved... apologies if this seems a bit
disjointed, I admit some confusion :-)

I am under the impession that filter_begin() is called once for each
message filtered by mimedefang.pl.  This is where things that remain
"consistant" for a message.

  So this is where the general "global" properties of the email
  being processed (body size, recipient, sender, relay host etc) can
  be examined and acted upon right away.  Here you could perhaps
  call action_discard() if message_contains_virus() returned true -
  and I assume that the message is discarded right away without
  further filter() processing.

  Also, filter_begin() sppears to be a good place to set global
  variables based on tests done only once on its overall message
  properties.  These variables can then be used as flags for
  determining actions taken in filter() or filter_end().

I expected so, or so my theory went.  But there was some unexpected
weirdnes that happened when I took this approach...

   I was experimenting with gathering all the Recieved: headers into
   a format that I wanted to re-parse and do things with later.
   However, I ended up with much more than I expected - other
   headers from other messages that happened to be going through the
   mail server at about the same time.

   Another example was the list I got out of @Recipients in
   filter_begin() and then used in filiter() - it contained
   addresses not present in the original.  But as far as I could
   tell, all messages otherwise went through without being mangled.

   Until, that is, I started setting a global $DISCARDME in
   filter_begein() that was acted on later.  That global was set for
   one test message I put throughit, then for a couple of other
   messages that went through the server around the same time... all
   were discarded (luckily after being quarantined).  Oops, not the
   way to do it :)

Has this quirk got anything to do with how the multiplexor works,
keeping processes continuously running, perhaps multiple instances
sharing the same environment?  If so, how is it possible to cope
with this?

It then appears that filter() is called once for each mime part.
Which means that unlike filter_begin(), it can be called more than
once.

  This means that things you want to check or set etc, should only
  pertain to the attachment.  Here you check for attachment sizes,
  identify viruses, or otherwise do things with the contents of the
  mime part.

  The danger in filter() is that if you set, append to or, or re-set
  any global variables that you might set in filter_begin(), then
  the results can be unexpected.  For example, for quite a while I
  thought I had some looping going on with, for example, duplicated
  quarantine messages, but not so.  I was setting global variables
  in filter_begin(), and it was being acted upon more than once in
  filter() and doing things more times that I expected.

  What is really going on here?  Are multiple instances of the
  filter really sharing the same local environment?

I assume filter_end() is happens once when the filter is about to
terminate.

  Would this be a better place to, for example, do things like
  action_discard() - especially if you want to do all the filter()
  checks, set things like a global $discard variable or whatever,
  then finally check for it and act accordingly at the end.

I'm just trying to get a feel for how all this is supposed to flow -
David, am I on the right track?

My perl is adequate enough to get the job done, but it doesn't yet
come naturally (yet), and I'm certainly no guru :)

BTW David, you mentioned that you had a problem to solve that
involved removing attachment, replacing them with a reference to a
URL, then putting the attachment somewhere for web access at that
URL.  How far did you get with this?

Thanks if you got this far :-)

Cheers
Tony
---*#*=-=*#*=-=*#*=-=*#*=-=*#*=-=*#*=---
  Tony Nugent <Tony at linuxworks.com.au>
  LinuxWorks - Gold Coast Qld Australia