[Mimedefang] Maxed-out slaves causing eternal tempfail after clamd timeout
Kelson Vibber
kelson at speed.net
Mon Dec 8 15:12:11 EST 2003
Since upgrading ClamAV to the latest "stable" version (0.65), it
occasionally causes MIMEDefang to max out on available slaves and never
drop any, despite having timeouts on both clamd (500 seconds) and
MIMEDefang (600 seconds). This causes the server to tempfail all incoming
mail until Clamd is restarted.
I have two levels of virus scanning, so the filter is set up to accept mail
even if it can't connect to clamd. The problem is that it *is* connecting
to clamd, but it's hanging somewhere in message_contains_virus_clamd
instead of timing out and returning an error.
For each incident, the Clamd logs show "Session x stopped due to timeout"
once. MD shows a steady rise in the number of active slaves until it maxes
out, at which point it starts logging "no free slaves" forever - or until a
human or a cron job restarts clamd. Once clamd stops, all the slaves
suddenly notice they can't connect to it anymore and continue on their
merry way.
Clearly the initial problem is a clamd bug (the suggestion over there is to
move up to the CVS version), but it should not be locking up MIMEDefang,
especially in a way that requires (more or less) manual intervention to
recover. It ought to realize at some point that it's not getting anything
back from the clamd socket, then either try to reconnect or just drop it
and move along as if it had been unable to connect in the first place.
This is on Red Hat 7.3 with a custom Linux kernel 2.4.22-ac4 (with the
do_brk patch), Sendmail 8.12.10, MIMEDefang 2.39 (not using the embedded
perl), and ClamAV 0.65.
Kelson Vibber
SpeedGate Communications <www.speed.net>
More information about the MIMEDefang
mailing list