[Mimedefang] SpamAssassin 2.40 experiences?

Wed Sep 4 14:24:01 EDT 2002

On 09/04/02 at 13:26, 'twas brillig and David F. Skoll scrobe:
> 
> > David, maybe 2.41-cvs has better weights yet.
> 
> I'm getting nervous... the weights have changed a lot between 2.31
> and 2.40.  I know they use a genetic algorithm to determine the
> weights.  The problem with that is that it can perturb the weights
> chaotically, and there's no guarantee you'll come up with an optimal
> solution, or even be able to tell how far you are from optimal.
> 
> I wonder if something like simulated annealing might be better?  Maybe
> it will produce more stable results.  Does anyone (i) know what simulated
> annealing is, and (ii) feel like implementing it? :-)
> 
> Has anyone heard any more about the Baynesian analysis that was all the
> rage a few weeks ago?

	Eric Raymond posted a proof-of-concept to www.milter.org
called bogofilter (www.tuxedo.org/~esr/bogofilter/) that I'm running
via procmail on my workstation, as an exercise in comparative
spamcanning. (I'm running SA on the primary servers, with default
rules and a threshold of 5.) 

	I fed bogofilter a starting database of about 400 spams and
about 650 "clean" emails -- both from my personal collection -- and
I've got procmail rules that cc any conflicting reports (i.e. anything
that bogofilter claims is spam but SA passes, or vice-versa) to
another folder. After running for a week, I have 28 or so conflicts,
of which I see:

11 SPAMs correctly ID'd by SA but false negatives for BF
3 SPAMs correctly ID'd by BF but false negatives for SA
4 legit emails correctly ID'd by SA but false positives for BF
10 legit emails correctly ID'd by BF but false positives for SA

	I also have about 5 spams that got false negatives from both,
and (so far) *nothing* that's gotten a false positive from both.

	However, the neat thing about bogofilter is that it learns --
the majority of the false positives for bogofilter occurred in the
first two days, and happened to be from a couple of mailing lists.
Once I fed it those messages, the false tags stopped.

	I built version 0.4 of this; I see that ESR is up to 0.6 now
(in a week! sheesh.) so I'll have to check it out. I also notice that
he's got a client/server setup already, so I just may look into
writing a milter to call before mimedefang that inserts a bogofilter
header.

	(Then I could give that header a spamassassin score and try
running it in tandem on the server... ergh, my head hurts.)


		Ole
--
Ole Craig * UNIX; postmaster, news, web; SGI martyr * CS Computing
Facility, UMass * <www.cs.umass.edu/~olc/pgppubkey.txt> for public key

perl -e 'print$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'