[Mimedefang] SpamAssassin via mimedefang is slow

Fri Nov 7 18:53:24 EST 2008

Daniel Bourque wrote:
> hope that helps out someone else who as the same issue.
Well I solved it another way.  Not because the standard MD way could not 
handle it.  I found that when running parallel spamassassins would for 
some reason deadlock if running on the same machine. Ie .. all childs 
are in busy state but not doing anything, this is probably MySQL related 
(as they all use the bayes via a database, although this is not 
verified).  So mimedefang could lock up for this reason.  These lockups 
seem to be related to spam runs targeting one of the mailservers we are 
running, in a redundant set of them.  This machine would just lock up, 
while other machines are sitting around doing nothing at all.  So I 
changed some things around.

I split sendmail, mimedefang, it's heavy filters and the mailstore. 
While splitting the heavy filters (spamassassin in this case, testing a 
way to split clamav .. not production ready though) any spam runs on one 
server in my cluster will load the mimedefang located on that server but 
all servers will use their spamassassin childs to handle the load.

I send my mails via the perl spamclient to a load balancer that sends it 
to all the members in the cluster. I did this cause problems occur when 
running too many spamd childs on one machine ( you could run 5 spamd 
childs comfortably on one virtual machine with 5 virtual machines on 
bare metal, but not 25 spamd childs on the same bare metal, it would 
deadlock horribly,  your millage may vary).

Anyhow .. in my situation my scan times are between 3-6 seconds per 
message with all network scans enabled (all clients are using a compiled 
set of static rules, it helps a lot), mail me offlist and I can help you 
set up something similar. I run my own DNS servers helping spamassassin 
a lot.

Conclusion: in the right situation spamassassin can scan with network 
tests enabled within 3-6 secs even with 30 mimedefang childs running on 
one server (with 30 spamd clients on the cluster, btw I have a secondary 
cluster that will kick in when all mailservers are under load).

Another tip: take a look at Digest::Nilsimsa (in my implementation I can 
detect 60% of the spam at the data phase without restoring to heavy 
scanners, like spamassassin, and temp fail it).

--
Michiel Brandenburg