[Mimedefang] AD/Commercial: Question

Thu Nov 3 14:57:58 EST 2005

Adam Lanier wrote:

> I think it sounds like an interesting idea.  Any idea how the bayes data
> would work in highly technical environments (finance, medical, legal
> etc)?  Our biggest issue these days is spam that 'looks' like finance
> related mail.

In our experience: Not too badly.  Spam terms tend to be mis-spelled.
Examples from our corpus:

"mortgage" appears in 3235 spams and 399 hams.
"m0rtgage" appears in 72 spams and 0 hams.

"Mortgage" appears in 769 spams and 337 hams
"mortggage", "m0rtggage" and "m0rttgage" are 100% reliable indicators of spam.
(In fact, if you consider this message non-spam, it's the first one we've
seen that is an exception. :-))

> How would we incorporate our own stream of bayes data into the RPTN data?

Ah, well.  By using CanIt-PRO. :-)

It is possible to post-process the RPTN data to include your own tokens,
but it would be a fair bit of work because we do not use SpamAssassin's
Bayes implementation.  (SA doesn't handle token pairs, and also stores
hashes of tokens rather than tokens themselves.)

> Sounds pretty cheap for ISP's or equivalent though if it increases the
> effectiveness of their spam system.

That's really the target market.

Regards,

David.