[Mimedefang] SpamAssassin 2.40 experiences?
Ole Craig
olc at cs.umass.edu
Wed Sep 4 14:24:01 EDT 2002
On 09/04/02 at 13:26, 'twas brillig and David F. Skoll scrobe:
>
> > David, maybe 2.41-cvs has better weights yet.
>
> I'm getting nervous... the weights have changed a lot between 2.31
> and 2.40. I know they use a genetic algorithm to determine the
> weights. The problem with that is that it can perturb the weights
> chaotically, and there's no guarantee you'll come up with an optimal
> solution, or even be able to tell how far you are from optimal.
>
> I wonder if something like simulated annealing might be better? Maybe
> it will produce more stable results. Does anyone (i) know what simulated
> annealing is, and (ii) feel like implementing it? :-)
>
> Has anyone heard any more about the Baynesian analysis that was all the
> rage a few weeks ago?
Eric Raymond posted a proof-of-concept to www.milter.org
called bogofilter (www.tuxedo.org/~esr/bogofilter/) that I'm running
via procmail on my workstation, as an exercise in comparative
spamcanning. (I'm running SA on the primary servers, with default
rules and a threshold of 5.)
I fed bogofilter a starting database of about 400 spams and
about 650 "clean" emails -- both from my personal collection -- and
I've got procmail rules that cc any conflicting reports (i.e. anything
that bogofilter claims is spam but SA passes, or vice-versa) to
another folder. After running for a week, I have 28 or so conflicts,
of which I see:
11 SPAMs correctly ID'd by SA but false negatives for BF
3 SPAMs correctly ID'd by BF but false negatives for SA
4 legit emails correctly ID'd by SA but false positives for BF
10 legit emails correctly ID'd by BF but false positives for SA
I also have about 5 spams that got false negatives from both,
and (so far) *nothing* that's gotten a false positive from both.
However, the neat thing about bogofilter is that it learns --
the majority of the false positives for bogofilter occurred in the
first two days, and happened to be from a couple of mailing lists.
Once I fed it those messages, the false tags stopped.
I built version 0.4 of this; I see that ESR is up to 0.6 now
(in a week! sheesh.) so I'll have to check it out. I also notice that
he's got a client/server setup already, so I just may look into
writing a milter to call before mimedefang that inserts a bogofilter
header.
(Then I could give that header a spamassassin score and try
running it in tandem on the server... ergh, my head hurts.)
Ole
--
Ole Craig * UNIX; postmaster, news, web; SGI martyr * CS Computing
Facility, UMass * <www.cs.umass.edu/~olc/pgppubkey.txt> for public key
perl -e 'print$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
More information about the MIMEDefang
mailing list