[Mimedefang] IP Reputation data collection (announcement, Internet draft)

Fri Apr 30 14:13:43 EDT 2010

On Fri, Apr 30, 2010 at 01:28:30PM -0400, Kevin A. McGrail wrote:
> For example, these are 7 more items that I think would be invaluable to  
> report.  Brainstorming to come up with more to add EVEN if you don't use  
> them but to define them in the RFC would be good in my opinion.
> 
> 1 - including the product / version used for auto-ham/spam and the  
> automated score & threshold of a spam

I see some of this as best handled out of band.  You already need to 
negotiate a username and shared secret before events can be reported to 
the aggregator, so that's probably the best time to communicate product 
and version information.

The issue of scores is tougher, particularly in situations where 
end-user configuration can change the score at any time.  Here, it may 
make sense to return the score and threshold with the event, but those 
two points of data may not provide enough information to be useful.  For 
example, two users (or CanIt streams, or filtering systems, or...) could 
have the same threshold and arrive at the same score for a nearly 
identical message, but for entirely different reasons.  It's probably 
enough for the purposes of reputation tracking to know that someone or 
something thought they saw a spam event from a given address.

> 2 - including virii/malware as a note

Another event type for "virus or malware seen" might be a good addition, 
but I don't see any value in communicating back anything more detailed 
than that for calculating reputation.   Differentiation between "virus" 
and other malware might be useful, too.

> 3 - dangerous attachments and a filename
> 4 - dangerous content

I guess the usefulness of this depends on the definition of "dangerous".  
What are you looking for here?

> 5 - reverse DNS failures

This might be good, but handling transient failures due to local or 
upstream DNS issues vs. failure to configure rDNS for a host might be 
necessary.

> 6 - improper HELO/EHLO statements

This is probably a good one to add.

> 7 - invalid MX records

That's not terribly useful for a sending IP address, as there's no 
legitimate reason the sending IP needs to be an MX of the sender's 
domain.

> I liked that in in #3 that REPUTATION database is not specific to  
> indexing by IPv4 or IPv6.  The system should be extensible to report  
> more data such as the email address of the sender or recipient, the  
> subject of the email, etc.  In theory, the system could even replace  
> Razor so it could include a hash of the email, etc.  But I would likely  
> caveatthe first sentence with "index by IPv4 or IPv6 address as oner  
> example".

That's probably a bit of scope creep.  The idea here is that filters can
communicate IP reputation information with a low-overhead UDP protocol. 
Sender address reputation might be worth investigating in a future 
iteration (the extensibility is there), but let's concentrate on the IP 
reputation case for now.

> The use of port 6568 could be expanded to stated something like unless  
> the AGGREGATOR utilizes an alternate port or something.  I have other  
> listeners on 6568 already, for example.

Well, it's an RFC, so "SHOULD" pretty much covers that.

> 4.2 would be best organized into 4.2.0 for reserved, 4.2.1 for  
> GREYLISTED, etc. so that all event types have a clear report  
> restriction.  Then 4.2 should be restrictions for all events like IPv4

Possibly:
	4.1.0  Reserved event type
		- Event type 0 is reserved
		- Event types 9 through 191 are reserved for future use
	4.1.1  Defined event types
		- all the info from existing 4.1
	4.1.2  Private event types
		- further explanation of private types

> Does " a priori knowledge" mean something or is it a grammar/spelling issue?

http://en.wikipedia.org/wiki/A_priori_and_a_posteriori#Use_of_the_terms

> I would include an extract definition of [GREY] in section 7 in addition  
> to the reference.  It's a term that confuses a lot of people that I  
> discuss anti-spam with that aren't anti-spam researchers.

Possibly a good idea, though I don't expect too many people who aren't 
involved in anti-spam activities will be interested in this RFC

Cheers,
Dave
-- 
Dave O'Neill <dmo at roaringpenguin.com>    Roaring Penguin Software Inc.
+1 (613) 231-6599                        http://www.roaringpenguin.com/
For CanIt technical support, please mail: support at roaringpenguin.com