Spamconfernece (was: Re: [Mimedefang] MIMEDefang/Bogofilter)

Michael Sofka sofkam at rpi.edu
Wed Feb 5 12:10:05 EST 2003


On Tuesday 04 February 2003 23:12, David F. Skoll wrote:
> On Tue, 4 Feb 2003, Michael Sofka wrote:
> > 	Simple linear weighting methods out-perform ``fancy'' methods
> > 	such as Genetic Algorithms, Neuro-Nets, Kohonen nets, etc.
> > 	(An MIT undergrad found he could crunch better SA weights
> > 	much faster using a LINPAC routine in place of SA's GA.)
>
> Really?  I'd love to see a reference for that.  How would you define
> an objective function for the optimization problem?  Or maybe I'm
> missing something?

The presenter was Michael Salib.  He used a method called LMMSE (Linear
Mean Square Estimation according to my notes--I"m not sure where the
extra M comes from).  The function finds ``optimal weights for a linear
combination of heuristic tests.'' (Quoting my notes.)  The LMMSE is
used for detection over noisy channels in electronics.  (I assume the
function is a discriminate to separate spam from ham., as with SA.)

> > 	No corpus of spam is large enough for training/tuning
> > 	detectors.  There is always another word, phrase, or way
> > 	of conveying an idea which will evade a detector.
>
> That's true.  I believe a combination of SpamAssassin rules (so that
> you immediately have something that works pretty well out of the box)
> plus some statistical "learning" is about the best pure filtering you
> can do.  SA 2.50 includes some statistical methods, I believe.

There were several talks mentioning ways to combine SA and statistical
methods.  E.g., feeding the heuristics to a Bayesian estimator.

> Also, there are some characteristics of spam that have nothing to do
> with message content, but rather transmission methods, as I mentioned
> in
> http://lists.roaringpenguin.com/pipermail/mimedefang/2003-January/004081.ht
>ml

I don't recall any mention of this, or similar ``wire'' techniques. E.g., to a 
good first approximation a spam attack looks like a denial of service. So, 
use packet shaping and firewalls. John Drapper's Crunch Box 
http://www.shopip.com/ probably does this, and similar things. It costs $12k, 
so if anybody here tests one please let us know what you think.


-- 
Michael D. Sofka              sofkam at rpi.edu
C&CT Sr. Systems Programmer    AFS/DFS, email, usenet, TeX, epistemology.
Rensselaer Polytechnic Institute, Troy, NY.  http://www.rpi.edu/~sofkam/




More information about the MIMEDefang mailing list