[Mimedefang] SpamAssassin via mimedefang is slow

Jeff Rife mimedefang at nabs.net
Sat Nov 8 13:52:34 EST 2008


On 8 Nov 2008 at 18:42, Michiel Brandenburg wrote:

> Basically Nilsimsa is a sounds like hash. Ie it makes a 256 bit 
> fingerprint of any binary data, like the message :). Now if you scan 2 
> messages and they are similar the hash will not look the same in hex but 
> looking at them in their binary form they would look the same.  So .. 
> count_amount_of_1_bits((hash message 1) XOR (hash message 2)) = A
> if A is smaller than say 10 ( so the hashes differ in 10 places or less 
> ) the message is nearly the same.

OK, so it turns into an O(n) algorithm, where you need to retrieve each 
hash you have already computed, then compare the hash of the current 
message against that.  After that, you add it to the database for 
future messages.

I don't think anything that has to retrieve every record in a database 
table will be significantly better than just running SA, since I'm 
averaging about 2 seconds to run all the content scans (virus, SA, 
etc.).

In theory, it could help a much larger site than mine, but as the site 
gets busier, you would need to keep even more hashes for longer periods 
of time, so it's probably a wash.


--
Jeff Rife |  
          | http://www.nabs.net/Cartoons/Pickles/Adoration.gif 





More information about the MIMEDefang mailing list