[Mimedefang] SpamAssassin via mimedefang is slow
Jeff Rife
mimedefang at nabs.net
Sat Nov 8 13:52:34 EST 2008
On 8 Nov 2008 at 18:42, Michiel Brandenburg wrote:
> Basically Nilsimsa is a sounds like hash. Ie it makes a 256 bit
> fingerprint of any binary data, like the message :). Now if you scan 2
> messages and they are similar the hash will not look the same in hex but
> looking at them in their binary form they would look the same. So ..
> count_amount_of_1_bits((hash message 1) XOR (hash message 2)) = A
> if A is smaller than say 10 ( so the hashes differ in 10 places or less
> ) the message is nearly the same.
OK, so it turns into an O(n) algorithm, where you need to retrieve each
hash you have already computed, then compare the hash of the current
message against that. After that, you add it to the database for
future messages.
I don't think anything that has to retrieve every record in a database
table will be significantly better than just running SA, since I'm
averaging about 2 seconds to run all the content scans (virus, SA,
etc.).
In theory, it could help a much larger site than mine, but as the site
gets busier, you would need to keep even more hashes for longer periods
of time, so it's probably a wash.
--
Jeff Rife |
| http://www.nabs.net/Cartoons/Pickles/Adoration.gif
More information about the MIMEDefang
mailing list