[Mimedefang] multiplexor error

Tue Jul 31 10:23:26 EDT 2007

>It seems to work fine after I commented out the line "lock_metho flock"

>When using SA bayes with Mimedefang, is it still neccessary to run sa-learn 
>regularly and manually copy spams to a folder and have sa-learn to learn 
>from the manual picks?
>I still don't see the bayes mark in the list of tests:
>******* (7.166) 
>HTML_FONT_SIZE_HUGE,HTML_IMAGE_ONLY_20,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY

If you have autolearn enabled, your database will be populated automatically - BUT it won't be used until there are 200 spam and 200 ham messages in the database, which can take some time given the default learning thresholds.

Have a look at http://spamassassin.taint.org/doc/Mail_SpamAssassin_Conf.html#learning%20options for details - to get started a little quicker, try:

bayes_auto_learn                1
bayes_auto_learn_threshold_nonspam 1.0
bayes_auto_learn_threshold_spam 7.0
bayes_learn_to_journal  1
bayes_min_ham_num 100
bayes_min_spam_num 100

You need to be very careful on the threshold settings - I pulled the spam threshold down from the default (12) to get more into the database quickly, and because my traffic doesn't tend to generate many scores between 5 and 10 which aren't probable spam.

To ensure that your Bayes filtering is working, or to see what the current levels are and whether Bayes is being used, run a manual scan on a Spam messages using:

spamassassin -p /etc/mail/sa-mimedefang.cf -t -D < INPUTMSG >& log

then look in the log file for lines which contain "bayes".  You'll see entries like this:

[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_toks
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_seen
[13515] dbg: bayes: found bayes db version 3
[13515] dbg: bayes: DB journal sync: last sync: 1185891353
[13515] dbg: bayes: DB journal sync: last sync: 1185891353
....
[13515] dbg: bayes: corpus size: nspam = 106513, nham = 175380
[13515] dbg: bayes: score = 0.999999999999992
[13515] dbg: bayes: DB expiry: tokens in DB: 142579, Expiry max size: 150000, Oldest atime: 1185454899, Newest atime: 1185891351, Last expire: 1185847233, Current time: 1185891410
[13515] dbg: bayes: DB journal sync: last sync: 1185891353
[13515] dbg: bayes: untie-ing
[13515] dbg: bayes: untie-ing db_toks
[13515] dbg: bayes: untie-ing db_seen
....
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_toks
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_seen
[13515] dbg: bayes: found bayes db version 3
[13515] dbg: bayes: a2591b41098c3418f8fd2179ebfb85bd1e8e79ac at sa_generated already learnt correctly, not learning twice
[13515] dbg: bayes: untie-ing
[13515] dbg: bayes: untie-ing db_toks
[13515] dbg: bayes: untie-ing db_seen
[13515] dbg: learn: initializing learner
[13515] dbg: check: is spam? score=25.063 required=5

You'll then also see a BAYES_xx score appear in the log and the Spamassassin report, e.g. the message above produced:

[13515] dbg: check: tests=AD_FORGED_ARGENTA_RCVD4,AWL,****BAYES_99****,HTML_40_50,HTML_MESSAGE,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_FORGED_WROTE,URIBL_BLACK,URIBL_SBL
[13515] dbg: check: subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__ENV_AND_HDR_FROM_MATCH,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__HAS_X_PRIORITY,__HTML_LENGTH_0000_1024,__LOCAL_PP_NONPPURL,__MIME_HTML,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__SANE_MSGID,__SARE_BODY_BLNK_5_100,__SARE_HAS_FG_COLOR,__SARE_HEAD_MIME_VALID,__SARE_HTML_BEHTML2,__SARE_HTML_HAS_A,__SARE_HTML_HAS_BR,__SARE_HTML_HAS_FONT,__SARE_HTML_HAS_TITLE,__SARE_META_MURTY3,__SARE_URI_ANY,__SARE_WHITELIST_FLAG,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__THEBAT_MUA,__TOCC_EXISTS,__TVD_BODY,__TVD_MIME_ATT_TP,__VIRUS_WARNING268F

and

Content analysis details:   (25.1 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 5.5 AD_FORGED_ARGENTA_RCVD4 Forged Received header claiming to be from
                            Argenta
 2.8 RCVD_FORGED_WROTE      Forged 'Received' header found ('wrote:' spam)
 0.5 HTML_40_50             BODY: Message is 40% to 50% HTML
 0.0 HTML_MESSAGE           BODY: HTML included in message
 4.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
                            [score: 1.0000]
 1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
                            above 50%
                            [cf: 100]
 0.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
 0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
                            [cf: 100]
 1.6 URIBL_SBL              Contains an URL listed in the SBL blocklist
                            [URIs: sasha.hk]
 3.0 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: sasha.hk]
 4.6 AWL                    AWL: From: address is in the auto white-list

Note that my Bayes_xx scores are tweaked a little from the default...

This is easier to debug if you keep the working directories for MIMEDefang around for a while, so that you can find the INPUTMSG file in the message folder under /var/spool/MIMEDefang (or equivalent).  See the man page for MD for details of how to do this.

Best Wishes,

Paul.
-- 

-------------------------------------------------------
Paul Murphy
Head of I.T.
Argenta Discovery
Tel. 01279 645 554
Fax. 01279 645 646

_______________________________________________________________________
Argenta Discovery Ltd, 8-9 Spire Green Centre, Harlow, Essex, CM19 5TR
Registered in England No. 3671653
_______________________________________________________________________