[Mimedefang] Reject 451 (Please try again later) messages

Sat Jun 21 07:40:01 EDT 2003

> 
> On Fri, 2003-06-20 at 18:49, David F. Skoll wrote:
> > On Fri, 20 Jun 2003, Bill Randle wrote:
> >
> > > My most recent conclusion, based on some other mail log messages I
> > > was seeing is that the mail server box is just plain underpowered
> > > and can't keep up with the incoming flood of email (1000-1200/hr).
> >
> > 1200/hr is not that much at all.  What was your system configuration?
> 
> Based on what others have said, it doesn't seem like a huge load
> to me, either. System configuration was dual 400 MHz Xenon,
> Intel 440BX motherboard, 768 MB RAM, UDMA 33 HDD, Intel EEPRO100
> Ethernet card. Kernel 2.4.19-33mdksmp, MIMEDefang-2.34-BETA-5,
> SpamAssassin-2.60 CVS.
> 
> With all network tests and bayes disabled, it still couldn't
> keep up.
> 
> > I'm working on 2.34-BETA-7 that has an experimental feature called
> > "queueing".  If all slaves are busy, we allow requests to queue (up to
> > a queue size limit, and only for a limited time period.)  New
> > connections get rejected early on, but established ones are given a
> > bit of extra queue time in the hopes they'll succeed.  I'd be
> > interested in hearing if this helps for heavily-loaded servers.
> 

I'm sure that many would be interested in such a queuing feature.

For what it's worth to the list readers, my largest site with MD deployed
processes about 1 mil. to 1.5 mil. messages each day.  While I can't give
specifics on this customer, I can tell you that I used 4 larger
multi-processor SPARC Solaris 2.8 boxes ... 3 in series for MX services and
one dedicated to customer outbound SMTP.

As David and others have so keenly pointed out, it is SA that is the
resource hog.  For my clients, anti-virus is a goal but we've not yet added
on to MD / SA; SPAM filtering was the original priority.

While my disclosed mail infrastructure would not rival genomics compute
farms, it is a lot of meat for only 1.5 mil. messages per day.  Without MD /
SA, a single one of these servers would hanld the mail load without
blinking.  Even with this meat and SA not being invoked for any locally
generated e-mail, I've been forced to:

1. Queue all messages for later delivery (which was the key to keeping
things from falling over no matter how much horsepower was thrown at the
problem)
2. Limit the number of inbound SMTP connection (which unfortunately opens
the deployment to DoS attacks)
3. Run n + 5 MD slaves (where n is the number of allowed inbound SMTP
connections)
4. Configure the input milter so that if MD fails, mail goes thru anyway
(which is not what anyone wants, but it seems that no matter how much
horsepower we thru at mail processing, it is still probable that the MD
milter will get its knickers in a twist, locking up and causing 4.5.1
responses)

I'd like to meet my clients' desire for AV as well as combating SPAM, but I
first have to conquer the MD / SA resource issue(s) (again, I know this is
SA and not MD, but when using one to facilitate the other, I really cannot
separate them).

Excepting a resolution of (or optimization of) the resource problems, the
next best thing would be a mechanism where sendmail would simply reject SMTP
connections when the milter fails (instead of processing to the point of a
4.5.1 message).  At least with this approach, the remote MTA would go to the
next MX host in the list.

This list, especially David, has been quite helpful in getting us thru the
initial growing pains with MD / SA.  I'd be very interested in others' war
stories on fine tuning or optimizing ...

... For example, my readings indicate that I should not need as many MD
slaves as I run, but as MD (with SA) slaves often go to an error state or
time out (even without network tests enabled), my practical experience does
not match what others write.  Once the number of SMTP connections exceeds
the number of MD slaves, any lag in completing the milter tasks starts a
back-log of processes ... that back-log consumes more resources ... in turn,
this makes MD lag or failure more probable ... a downward spiral.

To the original poster (Bill?), if you're not doing so already, try
configuring sendmail to queue all messages, then set up a queue group with X
number of children allowed (I used 15 per server).  Let us know if this
helps.

Cheers ... Phil