[Mimedefang] Maxed-out slaves causing eternal tempfail after clamd timeout

Mon Dec 8 15:12:11 EST 2003

Since upgrading ClamAV to the latest "stable" version (0.65), it 
occasionally causes MIMEDefang to max out on available slaves and never 
drop any, despite having timeouts on both clamd (500 seconds) and 
MIMEDefang (600 seconds).  This causes the server to tempfail all incoming 
mail until Clamd is restarted.

I have two levels of virus scanning, so the filter is set up to accept mail 
even if it can't connect to clamd.  The problem is that it *is* connecting 
to clamd, but it's hanging somewhere in message_contains_virus_clamd 
instead of timing out and returning an error.

For each incident, the Clamd logs show "Session x stopped due to timeout" 
once.  MD shows a steady rise in the number of active slaves until it maxes 
out, at which point it starts logging "no free slaves" forever - or until a 
human or a cron job restarts clamd.  Once clamd stops, all the slaves 
suddenly notice they can't connect to it anymore and continue on their 
merry way.

Clearly the initial problem is a clamd bug (the suggestion over there is to 
move up to the CVS version), but it should not be locking up MIMEDefang, 
especially in a way that requires (more or less) manual intervention to 
recover.  It ought to realize at some point that it's not getting anything 
back from the clamd socket, then either try to reconnect or just drop it 
and move along as if it had been unable to connect in the first place.

This is on Red Hat 7.3 with a custom Linux kernel 2.4.22-ac4 (with the 
do_brk patch), Sendmail 8.12.10, MIMEDefang 2.39 (not using the embedded 
perl), and ClamAV 0.65.

Kelson Vibber
SpeedGate Communications <www.speed.net>