[Mimedefang] Bayes Expiry Problem

Rick Mallett rmallett at ccs.carleton.ca
Thu Jan 29 12:31:27 EST 2004


A few days ago I reported a problem I was having with my bayes
database to the SATalk mailing list along with the observation that I
was pretty sure it was a bug in the bayes expiry software.

The problem was that under heavy load "bayes_toks.expire[pid]" files
were piling up in the bayes database area as seen in the following
partial "ls -l" listing

        32 Jan 26 12:19 bayes.lock
   2750039 Jan 26 12:27 bayes_journal
   20897792 Jan 26 12:19 bayes_seen
   21733376 Jan 26 12:22 bayes_toks
   9437184 Jan 26 12:22 bayes_toks.expire16781
   11173888 Jan 26 11:35 bayes_toks.expire23012
   5341184 Jan 26 10:54 bayes_toks.expire27549
   11182080 Jan 26 11:59 bayes_toks.expire27570
   11403264 Jan 26 10:44 bayes_toks.expire4752

A few days later I got a message from another person, David Lee, who
had run into the same problem and who thought it might be due to the
controlling agent, in his case a program called Mailscanner, timing
out the expiry process before it could complete.

It turns out that is exactly what was happening (I think). Bayes
expiry can often take 3 or 4 minutes to complete, and if the system
load happens to be really high when a mimedefang/spamassassin process
decides its time to do an expiry, the process can easily take much
longer, and if it takes longer than 5 minutes your're in trouble,
since AFAIK the sendmail default timeout on a milter operation
is 5 minutes. At least that's how I interpret the following info
in the Mimedefang HowTO document

  E      Overall timeout between sending end-of-message to filter
         and waiting for the final acknowledgment

  Note the separator between each is a ';' as a ',' already separates
  equates and therefore can't separate timeouts.  The default values (if
  not set in the config) are:

  T=C:5m;S:10s;R:10s;E:5m

  where 's' is seconds and 'm' is minutes.

Have I got that right? If a bayes expiry takes longer than 5 minutes
it will be abruptly terminated? Sure fits the observed phenomenon.

I'm also pretty sure this must be the case because I copied the files
to another location for testing and ran an expire via sa-learn and it
finished successfully in about 8 minutes, so it wasn't a matter of a
corrupted database causing the problem.

The point is that allowing bayes expiry to take place opportunisticaly
on a heavily loaded high volume site is a recipe for disaster, and what
you need to do is set "bayes_auto_expire 0" in your sa-mimedefang.cf or
local.cf file and use sa-learn to force an expire on a regular basis
via cron.

As I recall someone in this forum suggested such an approach in a
previous posting, but never gave a reason, so it didn't occur to me
that it was mandatory and not just a matter of personal preference.

I'll also be reporting this to the SATalk mailing list along with the
observation that bayes expiry takes much too long, and the code could
use some work to improve performance.

- rick



More information about the MIMEDefang mailing list