[Mimedefang] Bayes locking

Matthew.van.Eerde at hbinc.com Matthew.van.Eerde at hbinc.com
Wed Oct 6 14:04:07 EDT 2004


David F. Skoll wrote:
> On Wed, 6 Oct 2004, Paul Murphy wrote:
> 
>> the whole area of database locking for Bayes and AWL is a mess.
...
>> The solution is to use a proper database - the latest SA has support
>> for MySQL, which will be much more stable and on large databases
>> will also probably be faster.
> 
> Pros and cons.  I'm not sure I consider MySQL to be a "proper"
> database. :-) (We use PostgreSQL in our commercial products, though it
> has its annoyances too.)  PostgreSQL actually uses MVCC rather than
> locking, so readers are never blocked by a writer.
> 
> However, any semi-decent SQL database probably has a more robust
> locking mechanism than SA's Perl code, and has the huge advantage of
> being easily shared among several mail scanners.  I think a Berkeley
> DB will still beat a SQL database quite handily for raw data access
> (at least for the access patterns used to look up Bayes tokens.)

Warning: rather long and somewhat off-topic post follows.

This lends itself to my "using spamc/spamd from MIMEDefang" question.  Adding MySQL support or PostgreSQL support further inflates an already large Mail::SpamAssassin object.

When the largest thing in a MIMEDefang slave is the SpamAssassin object, the idea of factoring it out comes to my mind.  Instead of ten active MIMEDefang slaves, each hoarding their own private SpamAssassin object, consider twenty active MIMEDefang slaves - each calling spamc to connect to a pool of five running spamd daemons.

MIMEDefang does other things than run SpamAssassin - virus scans, action_rebuild'ing, running check_against_smtp_server, copying messages into quarantine, generating admin notifications, etc., etc.
While a slave is busy doing these things, its SpamAssassin object is just idly taking up memory.
If the slave bounces or discards the email before getting to the spamassassin check, the SpamAssassin object never gets used for that milter call.
If MIMEDefang is configured to skip SpamAssassin checks on (whitelisted users/authenticated users/local-to-local email/outgoing email/etc.), then any one of those emails will result in several milter calls to slaves, who will have an unused SpamAssassin object for the length of the call.

It would interest me to see some real-world statistics on what percentage of the time a given MIMEDefang slave spends in:
spam_assassin_is_spam
message_contains_virus
append_*_boilerplate
action_rebuild
filter*
anomy_clean_html
resend_message
idle
etc.
Any volunteers to run an analysis on their servers?

If spam_assassin_is_spam is running 99% of the time, then using native Perl makes perfect sense.  The number of spamd daemons would quickly climb to the number of MIMEDefang threads, and the net effect would be a bunch of extra "spamc" calls.
If spam_assassin_is_spam is running 1% of the time, using spamc/spamd makes more sense.  You could probably get away with a single spamd thread (yes, I know spamd preforks...)
If spam_assassin_is_spam is running 50% of the time... then... ummm... the best action probably depends on the hardware, the traffic level, and on the taste of the maintaining admin.

I'm fortunate to have memory to burn... for now... but those who are running into memory limitations might benefit from factoring out many memory-hungry SpamAssassin objects into a few spamd daemons.  This would allow for more MIMEDefang threads.  If MIMEDefang rejects many emails before the spam_assassin_is_spam call, the number of spamd threads can be significantly less than the number of MIMEDefang threads.  If MIMEDefang spends most of its time waiting for virus analysis to complete, the number of spamd threads can be significantly less than the number of MIMEDefang threads.

Calling a command-line app (spamc) from Perl isn't free.  It will likely slow any given email down by an appreciable amount.  I wonder though whether the memory savings will make up for this.

Best case, you save tons of memory.  Worst case, you waste a little memory and time calling spamc for no particularly good reason.

Matthew.van.Eerde at hbinc.com                      805.964.4554 x902
Hispanic Business Inc./HireDiversity.com         Software Engineer
perl -e"map{y/a-z/l-za-k/;print}shift" "Jjhi pcdiwtg Ptga wprztg,"



More information about the MIMEDefang mailing list