[Mimedefang] MIMEDefang/Bogofilter

Tue Feb 4 14:11:02 EST 2003

No.

But, there was discusson about this at the MIT spam 
confernece:http://spamconference.org/.

There were, for example, claims that statistical filtering (of various
kinds) were better than heuristics.  And claims that heuristics were
better on servers, while statistical filtering was better for
individuals.  And, then those claims were disputed by a later speaker.

What was lacking was data.  One talk put up the Miss and False Alarm
rates for a few systems, but no indication of criterion or distribution.
One speaker from Microsoft Research put up and ROC (Receiver Operator
Characteristic) curve for 4 purported methods, showing how they differ
in FA and Miss rates, but could not discuss what the methods were, and
presented no read data.  Still, he gets credit in my book for at least
explaining now decision algorithms need to be compared.  Then looses
half of them for not actually comparing any.

Here is what I think we can say with some confidence:

	Personalized methods work better than server-level methods.

	Any statistical procedure produced better results than
	human-intuited methods (SpamAssassin, for example, determines
	weights via a Genetic Algorithm, so it does well even though
	the patterns being weighted are human selected).

	Simple linear weighting methods out-perform ``fancy'' methods
	such as Genetic Algorithms, Neuro-Nets, Kohonen nets, etc.
	(An MIT undergrad found he could crunch better SA weights
	much faster using a LINPAC routine in place of SA's GA.)

	No corpus of spam is large enough for training/tuning
	detectors.  There is always another word, phrase, or way
	of conveying an idea which will evade a detector.

	Spammers, or at least some of them---but that's all it
	takes---are smart, and motivated.  This makes spam
	detection a hard AI problem.  (John Graham-Cumming's talk
	on POPFile was most informative, and enertaining, in this regard.)

	We need real data.

Non of this is new, most falls out of the past 40 years of natural language 
processing, statistical language processing, decision making, numerical 
methods and Artificial Intelligence research. Leading to another point made 
by a few speakers:

	Visit the library.

There were also some interesting talks likening spammers to organized
crime on the internet, and with reason.  On the whole a good day,
well worth the trip.  I just wish it took place over a couple days,
and had some long, organized breaks and outings so the participants
could meet and exchange business cards.

It was also pointed out that there is no economic incentive to go
after spammers.  Even when what they are doing is illegal (and it
usually is---there was a fine talk by Jon Pread who makes a living
suing spammers) they have no money, in the end, and so lawsuits have
to be bankrolled by ISPs.

Mike

On Monday 03 February 2003 15:42, Ray Parish wrote:
> Has anyone given a whirl with bogofilter compared to spamassassin? If so
> what were the results?

-- 
Michael D. Sofka              sofkam at rpi.edu
C&CT Sr. Systems Programmer    AFS/DFS, email, usenet, TeX, epistemology.
Rensselaer Polytechnic Institute, Troy, NY.  http://www.rpi.edu/~sofkam/