[Mimedefang] multiplexor error
Paul Murphy
Paul.Murphy at argentadiscovery.com
Tue Jul 31 10:23:26 EDT 2007
>It seems to work fine after I commented out the line "lock_metho flock"
>When using SA bayes with Mimedefang, is it still neccessary to run sa-learn
>regularly and manually copy spams to a folder and have sa-learn to learn
>from the manual picks?
>I still don't see the bayes mark in the list of tests:
>******* (7.166)
>HTML_FONT_SIZE_HUGE,HTML_IMAGE_ONLY_20,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY
If you have autolearn enabled, your database will be populated automatically - BUT it won't be used until there are 200 spam and 200 ham messages in the database, which can take some time given the default learning thresholds.
Have a look at http://spamassassin.taint.org/doc/Mail_SpamAssassin_Conf.html#learning%20options for details - to get started a little quicker, try:
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 1.0
bayes_auto_learn_threshold_spam 7.0
bayes_learn_to_journal 1
bayes_min_ham_num 100
bayes_min_spam_num 100
You need to be very careful on the threshold settings - I pulled the spam threshold down from the default (12) to get more into the database quickly, and because my traffic doesn't tend to generate many scores between 5 and 10 which aren't probable spam.
To ensure that your Bayes filtering is working, or to see what the current levels are and whether Bayes is being used, run a manual scan on a Spam messages using:
spamassassin -p /etc/mail/sa-mimedefang.cf -t -D < INPUTMSG >& log
then look in the log file for lines which contain "bayes". You'll see entries like this:
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_toks
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_seen
[13515] dbg: bayes: found bayes db version 3
[13515] dbg: bayes: DB journal sync: last sync: 1185891353
[13515] dbg: bayes: DB journal sync: last sync: 1185891353
....
[13515] dbg: bayes: corpus size: nspam = 106513, nham = 175380
[13515] dbg: bayes: score = 0.999999999999992
[13515] dbg: bayes: DB expiry: tokens in DB: 142579, Expiry max size: 150000, Oldest atime: 1185454899, Newest atime: 1185891351, Last expire: 1185847233, Current time: 1185891410
[13515] dbg: bayes: DB journal sync: last sync: 1185891353
[13515] dbg: bayes: untie-ing
[13515] dbg: bayes: untie-ing db_toks
[13515] dbg: bayes: untie-ing db_seen
....
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_toks
[13515] dbg: bayes: tie-ing to DB file R/O /home/defang/.spamassassin/bayes_seen
[13515] dbg: bayes: found bayes db version 3
[13515] dbg: bayes: a2591b41098c3418f8fd2179ebfb85bd1e8e79ac at sa_generated already learnt correctly, not learning twice
[13515] dbg: bayes: untie-ing
[13515] dbg: bayes: untie-ing db_toks
[13515] dbg: bayes: untie-ing db_seen
[13515] dbg: learn: initializing learner
[13515] dbg: check: is spam? score=25.063 required=5
You'll then also see a BAYES_xx score appear in the log and the Spamassassin report, e.g. the message above produced:
[13515] dbg: check: tests=AD_FORGED_ARGENTA_RCVD4,AWL,****BAYES_99****,HTML_40_50,HTML_MESSAGE,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_FORGED_WROTE,URIBL_BLACK,URIBL_SBL
[13515] dbg: check: subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__ENV_AND_HDR_FROM_MATCH,__HAS_MSGID,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_MAILER,__HAS_X_PRIORITY,__HTML_LENGTH_0000_1024,__LOCAL_PP_NONPPURL,__MIME_HTML,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__SANE_MSGID,__SARE_BODY_BLNK_5_100,__SARE_HAS_FG_COLOR,__SARE_HEAD_MIME_VALID,__SARE_HTML_BEHTML2,__SARE_HTML_HAS_A,__SARE_HTML_HAS_BR,__SARE_HTML_HAS_FONT,__SARE_HTML_HAS_TITLE,__SARE_META_MURTY3,__SARE_URI_ANY,__SARE_WHITELIST_FLAG,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__THEBAT_MUA,__TOCC_EXISTS,__TVD_BODY,__TVD_MIME_ATT_TP,__VIRUS_WARNING268F
and
Content analysis details: (25.1 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
5.5 AD_FORGED_ARGENTA_RCVD4 Forged Received header claiming to be from
Argenta
2.8 RCVD_FORGED_WROTE Forged 'Received' header found ('wrote:' spam)
0.5 HTML_40_50 BODY: Message is 40% to 50% HTML
0.0 HTML_MESSAGE BODY: HTML included in message
4.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
[score: 1.0000]
1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
above 50%
[cf: 100]
0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
[cf: 100]
1.6 URIBL_SBL Contains an URL listed in the SBL blocklist
[URIs: sasha.hk]
3.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: sasha.hk]
4.6 AWL AWL: From: address is in the auto white-list
Note that my Bayes_xx scores are tweaked a little from the default...
This is easier to debug if you keep the working directories for MIMEDefang around for a while, so that you can find the INPUTMSG file in the message folder under /var/spool/MIMEDefang (or equivalent). See the man page for MD for details of how to do this.
Best Wishes,
Paul.
--
-------------------------------------------------------
Paul Murphy
Head of I.T.
Argenta Discovery
Tel. 01279 645 554
Fax. 01279 645 646
_______________________________________________________________________
Argenta Discovery Ltd, 8-9 Spire Green Centre, Harlow, Essex, CM19 5TR
Registered in England No. 3671653
_______________________________________________________________________
More information about the MIMEDefang
mailing list