[Mimedefang] Scaling MIMEDefang / SA on Solaris

Wed Apr 16 11:17:01 EDT 2003

Folks;

I'm going nuts!  I've been playing with MIMEDefang / SpamAssassin on Solaris
and Linux for about 6 weeks.  I have the latest sendmail running with MD /
SA (also both at the latest releases) under Linux (Redhat 8.0) and Solaris
8.

Functionally, I'm extremely impressed by this software combo.  It is rock
solid on several of my low mail-volume sites.

But for the last 2 weeks, I've been trying to do deploy this combo to a
larger volume site.  This site is running multiple mail servers (2 primary
in parallel then two fallback MX hosts).  The two primary hosts are
multi-processor SPARCs (2 processors) each with 1gb memory.  All 4 servers
are fully patched Solaris 8 (running 64-bit kernels but with 32-bit
compilations of sendmail, perl, ...).  Yesterday, without MD / SA, this site
processed about 900,000 messages ... the two primary servers barely break a
sweat and the secondary MX hosts don't get used.

My problem is that with MD / SA deployed, the mail service falls flat on its
face.  I'd post configs, but I've tried so many different approaches that I
have no one set of files on which to request feedback.

There are several types of failures that occur.  Most common is that the
milter goes to an error state and at some point either sendmail or MD
removes the mimedefang.sock file.  During milter failures, I see "Please try
again later" SMTP response even though I've removed the F=T from the input
filter m4 sendmail configuration.

I'm sure that I'm facing resource problems.  I've read that Solaris 8 32-bit
applications are faced with a file descriptor limit of 1024 with milter's
use of select(), regardless of how high this tunable is set in /etc/system.
I've tried limiting SMTP daemon connections, I've tried small, medium, and
large values for MD resource pool.  I've tried increasing MD's BUSY timeout
and I've tried up'ing the timeout values in the milter m4 entry.

Even during failure conditions, the server's load average doesn't exceed 4
or 5 and is using only a bit of swap space.  But the milter times out or
goes into an error state, and things progressively degrade from that point.
When things are failing, file system access (like the ls command) is
essentially impossible, taking forever to complete, while other commands run
quite responsively.

Perhaps this is all due to my fairly limited knowledge of how milter works.
I'm looking for pointers / war stories from anyone using MD / SA on larger
Solaris based mail sites.  Specifically, I would enjoy learning what others
are using for /etc/system settings, MD startup settings, and sendmail milter
m4 settings.  And perhaps most import at this point is understanding how to
avoid getting to the point where the SMTP service replies with "Please try
again later"; if there are no more slaves available, sendmail acts as I'd
like, but once MD / milter fails completely, sendmail might as well not be
running because everything SMTP request gets a 4xx message.

On larger sites, I find it key to run sendmail in daemon mode only (ie queue
only) then use persistent queue runners to process mail.  Can I configure MD
to only run from the queue runners and not from sendmail -bd instances to
limit MD resources?

Any thoughts on system resources, pointers to reference material applicable
to this type of deployment, or simply comments on the philosophies behind
sendmail / milter resource management would be greatly appreciated.

... Phil