[Mimedefang] timeout before data read?

Wed Nov 17 15:43:17 EST 2004

Rich West wrote:
> Quite suddenly, and consistently, today at around noon, I started seeing 
> the following messages in my maillog, and the load average on the system 
> went through the roof (load average shot up beyond 22!) shortly afterwards.
> 
> Nov 17 15:02:51 cranium sm-mta[24072]: iAHK1XJR024072: Milter 
> (mimedefang): timeout before data read
> Nov 17 15:02:51 cranium sm-mta[24072]: iAHK1XJR024072: Milter 
> (mimedefang): to error state
> Nov 17 15:02:51 cranium sm-mta[24072]: iAHK1XJR024072: Milter: data, 
> reject=451 4.3.2 Please try again later
> Nov 17 15:02:51 cranium sm-mta[24072]: iAHK1XJR024072: 
> to=<user at myhost.com>, delay=00:01:01, pri=32062, stat=Please try again 
> later
> 
> Now, if I bring down both mimedefang and sendmail and bring it back up 
> again, I can get some mail to come through, but very little before it 
> starts spitting out these errors again.
> 
> And, it seems, that mimedefang keeps being forced to spawn a new 
> process, leaving the old one around to spin its wheels, and the new 
> process will generate the same error, and then spawn a new process 
> because of the failure, and so on and so on, becoming a very fast and 
> sick cycle. :(

There were some discussions on the list about this recently.

Basically what happens is that you either configured 
mimedfang-multiplexor or sendmail or both with too short timeouts. 
Because of this either multiplexor or sendmail or both are not waiting 
long enough for filtering to finish.  By your description, I'd say 
somebody sent you *very* large email, and sendmail is the one that got 
impatient.  Mail is rejected with tempfail, old process is still 
spinning, remote side tries to redeliver (probaby too agressivly), new 
process starts, and you end up going in circles, and your load average 
hits the roof.

The first timeout is in mimedefang-multiplexor.  It controls for how 
long multiplexor will let the child to run before killing it.  The 
second timeout is in sendmail.  It controlls for how long sendmail will 
wait for MIMEDefang to do its job.  You should set this two timeouts to 
aprox. same value (maybe setting multiplexor timeout minute or two 
shorter).  That way, old process will be terminated by multiplexor at 
about same time as sendmail gives up.  It really doesn't make any sense 
to have them set differently (the first that you hit will couase tempfail).

How long should the timeout be.  Depends on how large emails you are 
allowing (and the speed of your machine).  If you limit the size of 
email in sendmail to say 1-10MB, around 15 minutes should suffice on 
reasonable fast machine.  If you limit it to anything larger, values as 
large as 1 hour should be considered.

Small hint, do not run SpamAssassin on large emails.  If mail is larger 
than say 100kB, skip SpamAssassin checks.  They will take forever to 
complete.

Oh, BTW, multiplexor timeout is controlled in /etc/sysconfig/mimedefang 
(well, at least on RedHat type systems):

MX_BUSY=1740  # (29 minutes, we give up before sendmail does)

sendmail timeouts are defined in sendmail.mc:

INPUT_MAIL_FILTER(`mimedefang', 
`S=unix:/var/spool/MIMEDefang/mimedefang.sock, F=T, T=S:30m;R:30m;E:30m')

-- 
Aleksandar Milivojevic <amilivojevic at pbl.ca>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7