[Mimedefang] Re: Spam Assassin

Wed Jan 9 12:06:08 EST 2002

-----Original Message-----
From: David F. Skoll [mailto:dfs at roaringpenguin.com]
Sent: Wednesday, January 09, 2002 7:29 AM

> On Tue, 8 Jan 2002, Jeff Heinen wrote:
>
> > > I haven't had problems either, but I added the call in is_spam and
check.
> > > MIMEDefang kills the Perl process after it has processed (by default)
100
> > > messages, so this limits the propagation of memory leaks.
>
> > Limits, but if all 100 messages were large enough, it might be
something.
> 
> Why, is the amount of state stored by SpamAssassin proportional to message
> size?  I would not have thought so.

Actually, that was just an assumption on my part. I was trying to come up
with ideas that may have caused problems, but it was all theories. If you
notice, it keeps the whole message in the status object. (So you can
'$status->get_full_message_as_text();' and also use the status object to
rewrite the message). You are correct, they would have to be very large
messaged to make any difference at all, or a machine with limited resources.

> > You are right, I wasn't clearing our $is_spam or $report, yet oddly I
wasn't
> > having any problems of bleed, or mismatched reports. All it was doing
was
> > messing up the formatting of the message in Exchange. (Not like that is
> > difficult however).
> 
> Weird.  I have never seen that.

It is very weird, and thanks to Exchange, I can't seem to make it display
raw to me so I can figure out exactly what it is doing.

> > That's what I'm looking at now. Anyone well versed in MIME to venture
what
> > I'd need? Just content type, name, and data, right?
> 
> Yes.

I think this would just be the easiest way for me. I really wish I could
figure out what Outlook is doing, but then again, I fear what I might find.
(Like how people who used to work for fast food sometimes no longer eat at
that restaurant anymore. :_) )

> > This is a sticky part here. We did have this discussion, and the only
> > exception we had, really, was our pagers, which are bounced out to their
> > email address at the pager company.
> 
> How do you do the bouncing?  With a .forward file, or in aliases?
> If it's in .forward, you can write a little bounce script which won't
bounce
> things marked as spam.

virtusers & aliases. Nothing is ever delivered locally on this box. I was
actually considering letting the pagers be the only thing that is local on
the mail gateway, and let procmail trim out the spam and then forward onto
the pagers. My other thought was to use server-side filters on the exchange
server.

> > > See stream_by_domain in mimedefang-filter(5).  You _can_ use
MIMEDefang
> > > fairly efficiently to make per-domain decisions.
> 
> > I did see this, but how is this written? If someone drops a 100
recipient
> > Spam on me, will I end out resending this to everyone?
> 
> The message is repeated once for each unique domain name.  Example: If the
> message goes to a at a.net, b at a.net, a at b.net and b at b.net, then two copies
> are sent:  The first copy goes to a at a.net and b at a.net, and the second copy
> goes to a at b.net and b at b.net

Right, the question is: Assume my gateway is mail.a.net. The above message
comes in and it run at stream_by_domain. Will the mail to b.net process in
my sendmail first (where it will be stopped by the anti-relay rules), or
will it be instantly forwarded to mail.b.net.

>From what you said, I would assume the former, but I just want to be extra
careful and be sure.

> > All I need is to create a data structure with the extra
> > per-domain configuration information I need, and as mimedefang is
persistent
> > for 100 connections, the overhead to load it would not be that
noticeable.
> 
> You cannot rely on persistence between messages!  Although MIMEDefang is
> persistent for 100 messages, you have no way of knowing when the boundary
> hits.  If you use stream_by_domain, MIMEDefang will be re-invoked for
> each domain, and you can base your tests on the $Domain variable.  The
proper
> way to use this is shown in the mimedefang-filter man page.

Oops, this is my fault, and poor choice of words. I what I meant to say is:
'If I want to load my routing data structure so it is available for each
message, I can include it at the beginning of my filter and the CPU cycles
to load and keep it in memory will be averaged over the life of that
process. So I can use a data structure that takes a little longer to load,
because I'm not loading it for each message.'

As you can tell, I'm not very gifted with words. I've been looking at
different ways to access the extra information I need, I was wondering what
other people were using to do the same thing, if any.

-Jeff