[Mimedefang] AD/Commercial: Question
David F. Skoll
dfs at roaringpenguin.com
Thu Nov 3 14:57:58 EST 2005
Adam Lanier wrote:
> I think it sounds like an interesting idea. Any idea how the bayes data
> would work in highly technical environments (finance, medical, legal
> etc)? Our biggest issue these days is spam that 'looks' like finance
> related mail.
In our experience: Not too badly. Spam terms tend to be mis-spelled.
Examples from our corpus:
"mortgage" appears in 3235 spams and 399 hams.
"m0rtgage" appears in 72 spams and 0 hams.
"Mortgage" appears in 769 spams and 337 hams
"mortggage", "m0rtggage" and "m0rttgage" are 100% reliable indicators of spam.
(In fact, if you consider this message non-spam, it's the first one we've
seen that is an exception. :-))
> How would we incorporate our own stream of bayes data into the RPTN data?
Ah, well. By using CanIt-PRO. :-)
It is possible to post-process the RPTN data to include your own tokens,
but it would be a fair bit of work because we do not use SpamAssassin's
Bayes implementation. (SA doesn't handle token pairs, and also stores
hashes of tokens rather than tokens themselves.)
> Sounds pretty cheap for ISP's or equivalent though if it increases the
> effectiveness of their spam system.
That's really the target market.
Regards,
David.
More information about the MIMEDefang
mailing list