[Mimedefang] IP Reputation data collection (announcement, Internet draft)

David F. Skoll dfs at roaringpenguin.com
Fri Apr 30 14:58:27 EDT 2010


[Kevin's original message did not appear on the list because it was in
HTML and the list censor disapproved...]

Kevin A. McGrail wrote:

> I think adding more events are needed to be considered for the initial
> draft.  There is also the potential need for additional information on a
> report that should be considered.  So not all of these are EVENTS.

Right.  The important goals of this design are:

Goal 1) Bandwidth efficiency.  We receive a LOT of
reports... something like 350 IP events per second, and our goal is to
scale up to support at least a few thousand per second.  Every byte
counts.

Goal 2) Simplicity.  It's easy to know what to do with 16 gigabytes of
data when each data point is "<something> happened."  It's a lot
harder to know what to do with several hundred million
filenames... how do you distill useful information?

Dave O'Neill addressed your questions... I'd like to add my perspective:

> 1 - including the product / version used for auto-ham/spam and the
> automated score & threshold of a spam

We don't want that in every packet (see Bandwidth Efficiency) and it's
not clear what we'd do with the information anyway (Simplicity).

> 2 - including virii/malware as a note

Do you mean the virus name?  What would we do with the information?

> 3 - dangerous attachments and a filename

Same comment as (2)

> 4 - dangerous content

What is "dangerous content"?  What's dangerous to a Windoze user might
not be dangerous to me. :)

I guess I should add Goal 3, which is to handle (reasonably) objective
events only.  Yes, spam vs. non-spam is subjective, but the other
events are all very clear-cut: A recipient is either valid or is not.
A machine either passed greylisting or it did not.

> 5 - reverse DNS failures

These are not objective events.  A DNS failure could be because of a
transient network problem.

> 6 - improper HELO/EHLO statements

That's a good one.  We should probably add that.

> 7 - invalid MX records

Since we're collecting IP reputation data, "MX records" don't come into play.

> I liked that in in #3 that REPUTATION database is not specific to
> indexing by IPv4 or IPv6.

Err... we'd better fix that.  This proposal is (currently) strictly an IP
address reputation protocol.

> The system should be extensible to report
> more data such as the email address of the sender or recipient, the
> subject of the email, etc.

See Goals (1) and (2).

> In the same way, #2 Introduction, specifically talks about IP based
> lists.  You might want to broaden that to keep people in a broad mindset.

Nope.  It's specifically an IP reputation system.  I don't want to expand
it to other kinds of reputation.

> The use of port 6568 could be expanded to stated something like unless
> the AGGREGATOR utilizes an alternate port or something.  I have other
> listeners on 6568 already, for example.

*tsk* :-)  IANA gave us that port. :)  (but it's a SHOULD, so you have
an escape hatch.)

> 4.2 would be best organized into 4.2.0 for reserved, 4.2.1 for
> GREYLISTED, etc. so that all event types have a clear report
> restriction.  Then 4.2 should be restrictions for all events like IPv4

OK.  Though with only 8 event types, that seems like a bit of overkill.

Regards,

David.



More information about the MIMEDefang mailing list