[Mimedefang] Greylisting Code on Wiki

Thu Dec 13 11:52:09 EST 2007

On Wed, Dec 12, 2007 at 08:28:59PM +0100, Jonas Eckerman wrote:
> Important note: since I don't use spamd I might well be entirely 
> confused here. If I am, please ignore this post after alerting us 
> to this fact.

We do use spamd, and it does save us memory. However, that's
due to some specific quirks on our system. YMMV. In fact, your
mileage will undoubtedly vary.

If you're using a pretty basic MD setup where all mail is scanned
for virus and spam, spamd will not help.

We use spamd because a lot of mail is only scanned for viruses, not
for spam. As a result, MD slaves doing "scan" will only scan for
spam in a fraction of the cases. If all MD slaves have SA in memory,
that would waste a lot of memory. As a comparison, we have MX_MAXIMUM
set to 60, but spamd has a maximum of 20 slaves (I just checked, 
52 MD slaves active, 12 spamd slaves active).

It was even worse before, when MD didn't differentiate between
slaves that ran "scan" vs "filter_recipient", etc. Since we're also
doing various DNS lookups in the filter_recipient phase (blacklists,
MX checks), it would really waste a lot of memory.

We don't do per-user spamassassin settings, but having access to
per-user settings would also be a reason to use spamd.

> Andy Lyttle wrote:
> >>Anyone else using spamc and have any information to report?  Since I 
> >>use spamc/spamd on the same box, this seems like a no-brainer to 
> >>implement but perhaps someone has a field-tested warning?
> >
> >I've got four MD child processes, which 
> >means four instances of SpamAssassin loaded in RAM,
> 
> If you wait until SA is actually needed before loading it, you 
> might save some memory since MIMEDefang tries to reuse the slaves 
> for the same tasks they have allready been used for. Of course 
> this also means that processing (after  DATA) will take longer 
> the first time a slave needs to use SA.
> 
> Combining this with various means to avoid SA for most mails 
> saves more memory.

Only if you avoid SA, but do allow mails to get to the DATA phase.
Eg because you're only doing virus scanning, and not spam scanning,
as in our case. Or, if you also have another anti-spam engine that
you consult first, and that causes the mail to skip SA if there's
a match.

If you're doing RBL-style blocking, do that from filter_recipient,
even before you accept the DATA. It's very easy to make excemptions
for postmaster or other role accounts that way.

> IIRC spamd loads a complete SpamAssassin in each child, so the 
> main problem will still be there. Either spamd keeps extra 
> children around using memory, or your filter might occasionally 
> have to wait (again after DATA) while spamd needs to load a new 
> child. It might still be more efficient than the above solution 
> though.

spamd does pre-fork, and it shares the initialisation, but there's
indeed not a lot of data(/memory) sharing, it only shares the
initialisation computations (effectively the compilation of SA
rules to perl code).

-- 
Jan-Pieter Cornet <johnpc at xs4all.nl>
!! Disclamer: The addressee of this email is not the intended recipient. !!
!! This is only a test of the echelon and data retention systems. Please !!
!! archive this message indefinitely to allow verification of the logs.  !!