[Mimedefang] getting it right about how the filter works...
David F. Skoll
dfs at roaringpenguin.com
Wed Nov 28 09:23:47 EST 2001
---------- Forwarded message ----------
Date: Thu, 29 Nov 2001 00:11:36 +1000
From: Tony Nugent <tony at linuxworks.com.au>
(I hope this makes it to the list, I haven't seen anything since
the administrative notices)
I've been experimenting with some filtering rules in
mimedefang-filter (on the mail server late at night when nobody was
looking :) and I'm discovering some things that I'd like to have
confirmed.
First question: at what point is it "safe" (or unsafe) to access the
variables declared at the top of mimedefang.pl?
Second: regarding the names generated for the quarantine
directories...
I'd like to be able to easily sort them chronologically by name.
Is there any reason why a sequential timestamp mechanism not be
used to generate these names, instead of random numbers?
I'd also like to be able to use the name of the quarantine
directory as a reference string for the "incident", making them
easy to find if they need to be retrieved. I haven't spotted the
global variable that identifies this generated name may be - and
I assume that it is only available filter_end() or in filter() in
passes _after_ an action_quarantine() has been called.
So my idea is that it would be handy if the name of any possible
quarantine directory be available in advance, be numbered
squentially as a timestamp, and that it could also be useable as
a sort of defang reference string for the message itself.
Next one is more involved... apologies if this seems a bit
disjointed, I admit some confusion :-)
I am under the impession that filter_begin() is called once for each
message filtered by mimedefang.pl. This is where things that remain
"consistant" for a message.
So this is where the general "global" properties of the email
being processed (body size, recipient, sender, relay host etc) can
be examined and acted upon right away. Here you could perhaps
call action_discard() if message_contains_virus() returned true -
and I assume that the message is discarded right away without
further filter() processing.
Also, filter_begin() sppears to be a good place to set global
variables based on tests done only once on its overall message
properties. These variables can then be used as flags for
determining actions taken in filter() or filter_end().
I expected so, or so my theory went. But there was some unexpected
weirdnes that happened when I took this approach...
I was experimenting with gathering all the Recieved: headers into
a format that I wanted to re-parse and do things with later.
However, I ended up with much more than I expected - other
headers from other messages that happened to be going through the
mail server at about the same time.
Another example was the list I got out of @Recipients in
filter_begin() and then used in filiter() - it contained
addresses not present in the original. But as far as I could
tell, all messages otherwise went through without being mangled.
Until, that is, I started setting a global $DISCARDME in
filter_begein() that was acted on later. That global was set for
one test message I put throughit, then for a couple of other
messages that went through the server around the same time... all
were discarded (luckily after being quarantined). Oops, not the
way to do it :)
Has this quirk got anything to do with how the multiplexor works,
keeping processes continuously running, perhaps multiple instances
sharing the same environment? If so, how is it possible to cope
with this?
It then appears that filter() is called once for each mime part.
Which means that unlike filter_begin(), it can be called more than
once.
This means that things you want to check or set etc, should only
pertain to the attachment. Here you check for attachment sizes,
identify viruses, or otherwise do things with the contents of the
mime part.
The danger in filter() is that if you set, append to or, or re-set
any global variables that you might set in filter_begin(), then
the results can be unexpected. For example, for quite a while I
thought I had some looping going on with, for example, duplicated
quarantine messages, but not so. I was setting global variables
in filter_begin(), and it was being acted upon more than once in
filter() and doing things more times that I expected.
What is really going on here? Are multiple instances of the
filter really sharing the same local environment?
I assume filter_end() is happens once when the filter is about to
terminate.
Would this be a better place to, for example, do things like
action_discard() - especially if you want to do all the filter()
checks, set things like a global $discard variable or whatever,
then finally check for it and act accordingly at the end.
I'm just trying to get a feel for how all this is supposed to flow -
David, am I on the right track?
My perl is adequate enough to get the job done, but it doesn't yet
come naturally (yet), and I'm certainly no guru :)
BTW David, you mentioned that you had a problem to solve that
involved removing attachment, replacing them with a reference to a
URL, then putting the attachment somewhere for web access at that
URL. How far did you get with this?
Thanks if you got this far :-)
Cheers
Tony
---*#*=-=*#*=-=*#*=-=*#*=-=*#*=-=*#*=---
Tony Nugent <Tony at linuxworks.com.au>
LinuxWorks - Gold Coast Qld Australia
More information about the MIMEDefang
mailing list