[Mimedefang] Re: SPAM/HAM Trap
Yizhar Hurwitz
yizhar at mail.com
Wed May 23 16:24:59 EDT 2007
From: "Daniel Aquino" <mr.danielaquino at gmail.com>
> > * First and foremost, you should understand some issues related to email archiving:
> >
> > The privacy of your email clients - you should coordinate such actions with the company manager(s),
> > and I recommend also to inform all the users about it.
>
> Well I'm not going to read the emails... I just want to collect some
> detected spam/ham to train bayes.. I do have automatic training
> enabled but doesn't bayes need a kick start ?
It is good enough to start with an empty database and build it from scratch.
On a small volume system it takes few days to get automaticaly learned 200 ham + 200 spam,
and on high volume it should take only few hours.
So, you don't need the kick start even on a new machine.
> > It is recommended to train bayes against the actual and current email traffic,
> > not against historic private or public corpus.
>
> So does automatic learning work even before bayes has 200 emails ?
Automatic learning starts working as soon as you enable it.
The scoring action is paused until you get the minimum which is by default 200 ham + 200 spam.
> How can I verify that the bayes training is taking place ?
Use the following commands:
man sa-learn
sa-learn --dbpath /home/defang/.spamassassin --dump magic
(Pleae check my syntax for mistakes and set dbpath to fit your system).
> > You can also send all emails to the same journal at localhost address,
> > then use a delivery filter (procmail, cyrus seive, etc) to sort the messages into > different folders using the X-SpamScore header.
>
> Wouldn't the multi user approach be easier ?
It is up to you to decide and select the best for you.
I think that you also have to understand one major point here:
Automatic bayes training works without any need to collect messages and
take manual actions.
So if you have planned to collect the messages and run "sa-learn" with a
script - what's the point?
It can be done "on the fly" as the message is scanned the first time!
Collecting the messages can only be usefull for other things such as:
* Correcting auto learn mistakes, by manual (human) sorting of messages
and then running sa-learn against the sorted corpus or selected messages
that were false-positive or false-negative.
* Releasing blocked messages (for example false-positive).
* Other reasons for archiving mails unrelated to bayes training such as
forensic security investigations, troubleshooting, company archiving
policy, etc.
Again - if you plan to use those messages for automatic bayes training,
then don't.
Use bayes auto learn instead.
Yizhar Hurwitz
http://yizhar.mvps.org
More information about the MIMEDefang
mailing list