[Mimedefang] bayesian working?

Stephen Smoogen smoogen at lanl.gov
Sun Apr 11 18:35:21 EDT 2004


On Sun, 11 Apr 2004, Ashley M. Kirchner wrote:

>Stephen Smoogen wrote:
>
>>sa-learn --spam -C /etc/mail/spamassassin --showdots --mbox bad_file
>>sa-learn --ham  -C /etc/mail/spamassassin --showdots --mbox good_file
>>
>    Okay, so far so good on all replies.  Now, next question, again, 
>based on the two above commands:
>
>    Most, if not all, of my users keep e-mail in separate folders, where 
>one of them is their spam/junk folder (assuming they collect them) and 
>others for various types of e-mail.  Myself, for example, have over 24 
>separate folder every month that I file e-mail under.  And every month, 
>I archive them, and start fresh again (so I end up with 
>MIMEDefang-Jan04, MIMEDefang-Feb04, etc., etc.)
>
>    Should I run sa-learn on all of these folders and teach it what's 
>good, and also on all the spam/junk folders collected?
>
>    I realize the more I feed it, the more accurate it can detect spam, 
>but at what point is it just too much?

Ok, here are my lessons learned from my bad experience last month :).

1) Get both good and bad emails. 
2) Do not get too old of a folder of SPAM/HAM as it will weight older 
dates/patterns too high. 
3) Make sure you have more than the 1000 emails of both (or change your 
numbers in /etc/mail/spamasassisin/sa-mimedefang.cf

Here is my settings for a small site. Make the 

use_bayes               1
auto_learn              1
bayes_path                              /etc/mail/spamassassin/bayes
bayes_auto_expire                       1
bayes_auto_learn_threshold_nonspam      0.5
bayes_auto_learn_threshold_spam         10
bayes_expiry_max_db_size                100000
bayes_file_mode                         0644
bayes_ignore_header                     X-Spam-Status:
bayes_ignore_header                     X-Spam-Score:
bayes_journal_min_size                  10240
bayes_journal_max_size                  5120000
bayes_learn_to_journal                  1
bayes_min_ham_num                       100
bayes_min_spam_num                      100



-- 
Stephen John Smoogen		smoogen at lanl.gov
Los Alamos National Lab  CCN-5 Sched 5/40  PH: 4-0645
Ta-03 SM-1498 MailStop B255 DP 10S  Los Alamos, NM 87545
-- You should consider any operational computer to be a security problem --



More information about the MIMEDefang mailing list