[Mimedefang] Problem is happening right now (was: My MD install went wacko)

Wed Jun 11 11:06:01 EDT 2003

On Tue, 2003-06-10 at 19:42, Justin Shore wrote:
> On 8 Jun 2003, Bill Randle wrote:
> > This sounds suspiciously like the problems I was having (see the
thread
> > "[Mimedefang] mimedefang lockup - help needed" and related followups
> > [one of which was a reply from you :-)]. In my case the mimedefang
> > processes were dieing for unknown reasons.
> > When the system is hosed and you do a "ps -Alw -y -H | grep mime",
what
> > does it look like (compared to when the system is normal)?
> 
> Ding ding ding.  We have a winner!
>
> [root at bubba /usr/local/src/clamav/clamav-20030605]#> ps -Alw -y -H |
grep 
> mime
> S   UID   PID  PPID RSS    SZ WCHAN  TTY  TIME CMD
> S   202 18105     1 688   391 do_sel ?  00:00:05 mimedefang-mult
> S   202 28204 18105 25352 8156 pipe_w ?  00:00:05   mimedefang.pl
> S   202 32734 18105 15400 4357 pipe_w ?  00:00:00   mimedefang.pl
> S   202 18124     1 812   1185 rt_sig ?  00:00:00 mimedefang
> 
> Ok... so what does this mean?  I'm sitting here looking at these
errors 
> again:
> 
> Jun 10 21:28:55 bubba sm-mta[7213]: NOQUEUE: connect from
lists.sourceforge.net [66.35.250.206]
> Jun 10 21:28:55 bubba sm-mta[7184]: h5B2RtMS007184: Milter
(mimedefang): timeout before data read
> Jun 10 21:28:55 bubba sm-mta[7184]: h5B2RtMS007184: Milter
(mimedefang): to error state
> Jun 10 21:28:55 bubba sm-mta[7184]: h5B2RtMS007184: Milter
(mimedefang): init failed to open
> Jun 10 21:28:55 bubba sm-mta[7184]: h5B2RtMS007184: Milter
(mimedefang): to error state
> Jun 10 21:28:55 bubba sm-mta[7184]: h5B2RtMS007184: Milter:
initialization failed, temp failing commands
> Jun 10 21:28:56 bubba sm-mta[7184]: h5B2RtMS007184: SMTP MAIL command
(<root at oak.sktc.net> SIZE=6193) from oak.sktc.net [64.71.97.14]
tempfailed (due to previous checks)
> 
> just like before.  This is the first time this happened since the 8th
when 
> you replied to my email.  I bounced thousands of pieces of spam
through it 
> yesterday too and it never hiccuped (even though the load reached the 
> upper 40s at times :).  I would leave it like this for a day or so if
it 
> weren't for the fact that this very email must pass through this
server.  
> *sigh*  
> 
> For the record this is a RH 7.3 box running Sendmail 8.12.9, MD 
> 2.43-BETA5, SA 2.60 from CVS (recompiled nightly), w/ kernel
2.4.21-rc2.  
> Let me know if there's anything ya'll want me to try.

Hi Justin,

I'm afraid I don't have an answer for you. My configuration is
similar to yours and it's been working for close to a week now
without problems. As I said, in my case it was the upgrade to
MIMEDefang 2.43-BETA-5, but I didn't see anything obvious in
diffing the mimedefang code that would make things start working -
and in your case, that didn't seem to make any difference anyway.

I'm not even sure of the exact failure mechanism. E.g., because it's
hung, do the child mimedefang processes do their normal idle timeout
thing and exit? If so, what prevents new processes from being spawned?
Is the initial "timeout before data read" error the cause or just a
symptom of it already being in the failure state? I suspect the later
messages "to error state" and "init failed to open" are symptoms of
it already being hung, but what causes it to get there in the first
place?

I beginning to wonder if it isn't some kind of race condition, as
changing various pieces of the overall filtering process seem to make
a difference in how often in happens. At one time, it was happening
to me as often as every few minutes for a short period of time, then
would go 30-40 minutes before hanging again. This is with an incoming
mail load of 800-1200 messages/hour. As I said, each time I changed
something it seemed to improve: tmpfs for /var/spool/MIMEDefang,
SA-2.60 CVS, newer kernel, MIMEDefang 2.43-BETA-5. That's what makes
me think it's a timing / timeout problem. But where?

Oh, one other thing I changed that seemed to incrementally help (based
on a posting someone made to this list, or maybe the SA list): I
increased the sendmail Timeout.quit (confTO_QUIT) to 5m, Timeout.misc
(confTO_MISC) to 4m and Timeout.control (confTO_CONTROL) to 4m.

My next step (if it hadn't started working) would have been to
instrument the mimedefang code with additional debug print messages
to see if I could find why the child processes were dying off and
not restarting.

Sorry I can't be much more help. You might try the sendmail timeouts,
though, and see if they make any difference.

        -Bill