[Mimedefang] Image blocking idea

David F. Skoll dfs at roaringpenguin.com
Fri Apr 21 20:30:04 EDT 2006

Martin Blapp wrote:

> I already log possible text (I count alphanummeric chars in the ocr output)

I think it would be interesting to add a new text/plain part to the e-mail
consisting of the OCR'd text, and feed that into Bayes.  Even if OCR gets
some words wrong, I bet the same mis-spelled tokens would quickly rise
to the top of the "spammy" token list.

We did some tests along these lines, and as a side-benefit, we discovered
some SARE stock-scam tests firing on the OCR output.



