[Mimedefang] summary of site-wide bayes with SA 2.5 and MD?

David F. Skoll dfs at roaringpenguin.com
Thu Feb 27 11:47:01 EST 2003

Jason Englander wrote:

> From then on it should use autolearn to build the bayes db.  (SA score
> of -2 or less = ham, SA score of 15 or more = spam)

What is the theoretical basis to justify auto-learning?  I do not think
it makes sense.  SA already "correctly" categorizes the mail, so there's
no point in modifying the Bayes statistics.  And on the odd chance
that an outlying e-mail is accidentally misclassified, you pollute
your statistical pool, and make SA more likely to misclassify such
e-mail in the future.

The whole point of learning is that you teach the discriminator
the "absolute truth" as decided by a human being.  Auto-learning,
in my opinion, violates that important principle.


