[Mimedefang] Bayesian analysis (was re: site-wide...)

Thu Feb 27 13:55:01 EST 2003

> From: David F. Skoll
> I'm also skeptical that Bayesian analysis is scalable.  We're trying
> to sell our commercial CanIt-PRO solution to customers with upwards of
> 25K users, and I just can't see maintaining 25,000 separate Bayesian
> databases as being feasible or even desirable.  I also don't see one
> site-wide database being much better than plain-vanilla SpamAssassin.

Plain-vanilla SpamAssassin will probably always lag behind the tricks
of misspelling key words, etc. that are increasingly used since someone
has to build those rules after seeing the examples.  I think a good
approach would be to provide an email address for users to send
human-identified spam that passed the SA checks so far and let the
Bayesian analysis learn those (an approach similar to the razor checks
but based on content).  How much difference is there in per-user spam
classifications?  I'd think the biggest one would be that some people
forget they signed up for some commercial notifications and consider
them spam while others getting the same things want to see them. 

---
  Les Mikesell
   les at futuresource.com