[Mimedefang] Training SA when mail is not stored locally?

Stefano McGhee SMcGhee at ARCweb.com
Thu Feb 5 14:26:11 EST 2004


> How would learning be affected on a machine that receives ONLY spam?
> For example, our secondary server receives a steady flow of 
> garbage all
> day and night, and only gets good stuff when the primary one 
> goes down.
> If all it ever gets is garbage, how will it know what is 
> legitimate when
> it sees it?  Won't its learning curve be skewed such that it 
> knows good
> spam and bad spam?  :)
> 
Hello Joe,
	From what I've been reading, if the machine gets *only* spam, Bayes
is not useful thing.  Apparently, it's more important to have ham rather
than spam in your corpus as a ratio.  Without a means to differentiate spam
from ham, Bayes becomes less useful.  Posts on SA-Talk say that it's OK for
the ratio to be a little stacked to spam, but if most of your mail is spam,
Bayes becomes less effective and possibly counterproductive.  That's why
there is a minimum (spam/ham) corpus threshold for Bayes to get started.

Cheers,

Stefano





More information about the MIMEDefang mailing list