[Mimedefang] Memory usage/number of slave processes

Tue Aug 19 16:42:01 EDT 2003

John Scully wrote:
> I am just in the final stages of bringing up a filtering server using
> sendmail 8.12.9, mimedefang 2.36, spamassassin (including full RBL
> checks and DCC) and clamav.  The server is a P4 2.8Ghz, 1G ram

Personally, I'm partial to Razor2. You may want to consider running that
in addition to or in place of DCC. But, that's just my suggestion. YMMV.

...

> I would appreciate any input from anyone out there using virus checks,
> spammassassin on a similar platform.  What kind of load can it handle,
> and how many slaves does it peak at?  I am concerned that I have
> something configured wrong causing extra delays.
> 
> Also, what are you doing to throttle load?  Limiting child processes
in
> sendmail.cf?

As the initialization of SpamAssassin is the most expensive part of my
setup, I have set MX_MINIMUM and MX_MAXIMUM to the same value. Some
experimentation should allow you to optimize this value for your system.
The important part is that they are the same. That way, the multiplexor
will startup X number of slaves right away instead of loading them on
demand. While this seems counterintuitive, it keeps the computationally
expensive SpamAssassin startup from creating a snowball effect.

Also, I force SpamAssassin to be initialized on slave startup instead of
when first message is scanned by adding this to the top of my
mimedefang-filter:

# Load SpamAssassin
spam_assassin_init();
if (defined($SASpamTester)) {
	$SASpamTester->compile_now(1);
}

With these two changes, once MIMEDefang has all the slaves loaded, it
runs very efficiently.

If you have RAM to spare (which you should with 1GB), you will probably
want to have your spool directory (/var/spool/MIMEDefang or similar) on
a RAM disk (a/k/a tmpfs filesystem). My /etc/fstab has a line like this:

none /var/spool/MIMEDefang tmpfs uid=defang,gid=defang,mode=700 0 0

The most important thing to remember is that SpamAssassin checks need to
be at the very end of filter_end. Avoid running SpamAssassin at all
costs... In other words, if a message is going to be rejected by a less
computationally expensive test, do that test first and return from the
filter.

If you're doing per-user filtering, see if you can do it by only running
SpamAssassin once. In my case, I load all the user prefs into a hash and
then call SpamAssassin if at least one user wants filtering and then I
compare the result against the filter and act accordingly (by using
delete_recipient if the message scored higher than that user's
threshold). I keep a counter of recipients and if that hits zero, I call
action_bounce. It's the best compromise I've come up with.

Here's a suggested filter snipped for filter_begin:

if ($SuspiciousCharsInHeaders || $SuspiciousCharsInBody) {
	action_bounce("Invalid characters found in message");
	return;
}

You may want to add some logging to that as well. But, I've found that
this single check blocks a ton of spam will no false positives. Again,
YMMV. But, it's basically a computationally free test so it pays to run
it early if you're going to run it. If your system will be handling
local mail and you choose not to scan local mail, check that right away
to avoid burning CPU time.

Richard Laager