[Mimedefang] mimedefang lockup - help needed (followup)

Bill Randle billr at neocat.org
Thu May 22 11:43:00 EDT 2003


On Tue, 2003-05-20 at 10:42, Nels Lindquist wrote:
> On 20 May 2003 at 6:53, Phil Eschallier wrote:
> 
> <SNIP>
> 
> > This seems unlikely here as things work at the
> > start ... so it could be that something is changing the permissions on
> > the socket file of the directories at some point after launch.
> 
> Right!  There have been several discussions on the list about 
> Mandrake's paranoid security causing problems, including a buggy msec 
> which "fixes" ownership & permissions of such things as sendmail, 
> queue directories, etc.
> 
> Do a search on the mailing list archive for "Mandrake permissions" 
> and see if any of the results might be helpful.

Thanks for the suggestions provided by several people. I tried them all:
	1. check for the mimedefang process dying
	A. The mimedefang process was still hanging around, albeit
	   in the sleep state. [But see additional info below.]

	2. try switching /var/spool/MIMEDefang to a tmpfs file system
	A. I switched to tmpfs for the mimedefang spool directory and
	   it seemed like it may have helped a little, but not by a
	   large amount.

	3. check for Mandrake Linux msec changing permissions
	A. I checked this, also, and permissions were not being
	   changed. Everything remained owned by the defang userid.

In an effort to narrow down the problem I commented out the Spamassassin
call, leaving the virus scanning. I ran this for over an hour with no
problems or lockups. This seemed to point to the spam filtering as
being the trigger, though not necessarily the real cause.

So, in an act of desperation before chucking the whole thing, I upgraded
SpamAssassin to the nightly 2.60 CVS build and converted the bayes dbs
to the new format. This actually seems to have solved the problems.
After letting it run overnight I only got one lockup, whereas before
it was happening several times an hour.

I don't think it was the bayes stuff, as I had earlier set use_bayes and
auto_learn to 0 in the config file in an attempt to reduce spam
processing time.

Additional Info:

The problem is not completely solved, as it has locked up twice this
morning. But it is a lot better than it was. I'm still using tmpfs.
According to "graphdefang", our incoming email rate averages around
1000 mails/hour with peaks of 1200/hr or slightly higher. (Of which
80+% is spam.)

The interesting thing that happens in the "lockup" situation is that
several mimedefang processes appear to die and not get restarted. For
instance, here's a process tree for normal conditions:

# ps -Alw -y -H|grep mime
S     PID  PPID   RSS    SZ WCHAN  TTY          TIME CMD
S   30397     1   808   423 do_sel ?        00:00:00   mimedefang-mult
R   30435 30397  23184 6363 -      ?        00:00:28     mimedefang.pl
S   30609 30397  22996 6308 pipe_w ?        00:00:16     mimedefang.pl
S   30612 30397  22508 6185 pipe_w ?        00:00:09     mimedefang.pl
S   30745 30397  22748 6251 pipe_w ?        00:00:08     mimedefang.pl
S   30753 30397  21820 6016 pipe_w ?        00:00:05     mimedefang.pl
S   30754 30397  21768 6003 pipe_w ?        00:00:05     mimedefang.pl
S   30759 30397  21792 6008 pipe_w ?        00:00:05     mimedefang.pl
S   30767 30397  22048 6072 pipe_w ?        00:00:05     mimedefang.pl
S   30420     1  1112  5588 do_sel ?        00:00:00   mimedefang
S   30422 30420  1112  5588 do_pol ?        00:00:00     mimedefang
S   30423 30422  1112  5588 rt_sig ?        00:00:00       mimedefang
S   30475 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30678 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30835 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30856 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30885 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30887 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30923 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30966 30422  1112  5588 do_sel ?        00:00:00       mimedefang
S   30967 30422  1112  5588 unix_s ?        00:00:00       mimedefang

Here's what it looks like during lockup:

# ps -Alw -y -H|grep mime
S     PID  PPID  RSS    SZ WCHAN  TTY          TIME CMD
S   25607     1   424   426 do_sel ?        00:00:00   mimedefang-mult
S   27283 25607  23888 6548 pipe_w ?        00:01:25     mimedefang.pl
S   27330 25607  23628 6481 pipe_w ?        00:00:56     mimedefang.pl
S   27646 25607  23528 6453 pipe_w ?        00:00:36     mimedefang.pl
S   27651 25607  23140 6343 pipe_w ?        00:00:25     mimedefang.pl
S   28123 25607  22840 6271 pipe_w ?        00:00:16     mimedefang.pl
S   28728 25607  22208 6106 pipe_w ?        00:00:08     mimedefang.pl
S   29327 25607  22276 6124 pipe_w ?        00:00:07     mimedefang.pl
S   29343 25607  21964 6048 pipe_w ?        00:00:05     mimedefang.pl
S   29385 25607  22248 6118 pipe_w ?        00:00:07     mimedefang.pl
S   25638     1   864  6108 rt_sig ?        00:00:00   mimedefang

Does this provide any other ideas of what might be happening? Again,
here are some sample entries from the mail error file:

May 22 08:05:10 outlaw2 sendmail[20179]: h4MD55HQ020179: Milter
(mimedefang): write(A) returned -1, expected 5: Broken pipe
May 22 08:15:46 outlaw2 sendmail[29913]: h4MFEiHQ029913: Milter
(mimedefang): timeout before data read
May 22 08:15:46 outlaw2 sendmail[29921]: h4MFEkHQ029921: Milter
(mimedefang): timeout before data read
May 22 08:15:46 outlaw2 sendmail[29921]: h4MFEkHQ029921: Milter
(mimedefang): init failed to open
May 22 08:15:47 outlaw2 sendmail[29922]: h4MFElHQ029922: Milter
(mimedefang): timeout before data read

	-Bill





More information about the MIMEDefang mailing list