[Mimedefang] Bayesian analysis (was re: site-wide...)

Peter P. Benac ppbenac at emacolet.com
Thu Feb 27 14:38:01 EST 2003


Actually if you follow the SpamAssassin list it has been suggested to do
just this with one exception.

SA comes with a program called SA-LEARN  (perldoc sa-learn)

It has been suggested that Sa-learn be fed with at least 200 spam and 200
ham (real messages) so the Bayes can learn from them.

I have setup two accounts on my system (i.e. spamlearn and nospamlearn)
which I have requested my mail users to either bounce or forward (as
attachments non-attachments can confuse the Bayes learning) potentially good
and bad messages to the appropriate account. 

Thus far Bayes has produced very few false positives and even smaller false
negatives.  I am almost at a point where I feel comfortable enough to set my
own mail clients filters to delete anything tagged as Spam.

Regards,
Pete
----
Peter P. Benac, CCNA
Emacolet Networking Services, Inc
Providing Systems and Network Consulting, Web Hosting Services
Phone: 919-847-1740 or 866-701-2345
Web: http://www.emacolet.com

To have principles...
             First have courage.. With principles comes integrity!!!

 



-----Original Message-----
From: mimedefang-admin at lists.roaringpenguin.com
[mailto:mimedefang-admin at lists.roaringpenguin.com] On Behalf Of Les Mikesell
Sent: Thursday, February 27, 2003 13:54
To: mimedefang at lists.roaringpenguin.com
Subject: RE: [Mimedefang] Bayesian analysis (was re: site-wide...)


> From: David F. Skoll
> I'm also skeptical that Bayesian analysis is scalable.  We're trying 
> to sell our commercial CanIt-PRO solution to customers with upwards of 
> 25K users, and I just can't see maintaining 25,000 separate Bayesian 
> databases as being feasible or even desirable.  I also don't see one 
> site-wide database being much better than plain-vanilla SpamAssassin.

Plain-vanilla SpamAssassin will probably always lag behind the tricks of
misspelling key words, etc. that are increasingly used since someone has to
build those rules after seeing the examples.  I think a good approach would
be to provide an email address for users to send human-identified spam that
passed the SA checks so far and let the Bayesian analysis learn those (an
approach similar to the razor checks but based on content).  How much
difference is there in per-user spam classifications?  I'd think the biggest
one would be that some people forget they signed up for some commercial
notifications and consider them spam while others getting the same things
want to see them. 

---
  Les Mikesell
   les at futuresource.com _______________________________________________
MIMEDefang mailing list
MIMEDefang at lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang





More information about the MIMEDefang mailing list