Spamconfernece (was: Re: [Mimedefang] MIMEDefang/Bogofilter)
Michael Sofka
sofkam at rpi.edu
Wed Feb 5 12:10:05 EST 2003
On Tuesday 04 February 2003 23:12, David F. Skoll wrote:
> On Tue, 4 Feb 2003, Michael Sofka wrote:
> > Simple linear weighting methods out-perform ``fancy'' methods
> > such as Genetic Algorithms, Neuro-Nets, Kohonen nets, etc.
> > (An MIT undergrad found he could crunch better SA weights
> > much faster using a LINPAC routine in place of SA's GA.)
>
> Really? I'd love to see a reference for that. How would you define
> an objective function for the optimization problem? Or maybe I'm
> missing something?
The presenter was Michael Salib. He used a method called LMMSE (Linear
Mean Square Estimation according to my notes--I"m not sure where the
extra M comes from). The function finds ``optimal weights for a linear
combination of heuristic tests.'' (Quoting my notes.) The LMMSE is
used for detection over noisy channels in electronics. (I assume the
function is a discriminate to separate spam from ham., as with SA.)
> > No corpus of spam is large enough for training/tuning
> > detectors. There is always another word, phrase, or way
> > of conveying an idea which will evade a detector.
>
> That's true. I believe a combination of SpamAssassin rules (so that
> you immediately have something that works pretty well out of the box)
> plus some statistical "learning" is about the best pure filtering you
> can do. SA 2.50 includes some statistical methods, I believe.
There were several talks mentioning ways to combine SA and statistical
methods. E.g., feeding the heuristics to a Bayesian estimator.
> Also, there are some characteristics of spam that have nothing to do
> with message content, but rather transmission methods, as I mentioned
> in
> http://lists.roaringpenguin.com/pipermail/mimedefang/2003-January/004081.ht
>ml
I don't recall any mention of this, or similar ``wire'' techniques. E.g., to a
good first approximation a spam attack looks like a denial of service. So,
use packet shaping and firewalls. John Drapper's Crunch Box
http://www.shopip.com/ probably does this, and similar things. It costs $12k,
so if anybody here tests one please let us know what you think.
--
Michael D. Sofka sofkam at rpi.edu
C&CT Sr. Systems Programmer AFS/DFS, email, usenet, TeX, epistemology.
Rensselaer Polytechnic Institute, Troy, NY. http://www.rpi.edu/~sofkam/
More information about the MIMEDefang
mailing list