[Mimedefang] IP Reputation data collection (announcement, Internet draft)
Kevin A. McGrail
KMcGrail at PCCC.com
Fri Apr 30 16:01:40 EDT 2010
Hi Dave,
Passionate technical debate follows ;-)
DFS, I believe my comments below also address your comments which I
received slightly later.
In synopsis, I'd recommend you go with the broader, more flexible RFC.
This is a great idea IMO either way, though!
regards,
KAM
>> 1 - including the product / version used for auto-ham/spam and the
>> automated score & threshold of a spam
>
> I see some of this as best handled out of band. You already need to
> negotiate a username and shared secret before events can be reported
> to the aggregator, so that's probably the best time to communicate
> product and version information.
As versions are always changing, you might want to know that someone is
using SpamAssassin 3.X and another person is using IHateSpam, etc.
> The issue of scores is tougher, particularly in situations where
> end-user configuration can change the score at any time. Here, it may
> make sense to return the score and threshold with the event, but those
> two points of data may not provide enough information to be useful.
> For example, two users (or CanIt streams, or filtering systems, or...)
> could have the same threshold and arrive at the same score for a
> nearly identical message, but for entirely different reasons. It's
> probably enough for the purposes of reputation tracking to know that
> someone or something thought they saw a spam event from a given address.
I agree it's not a complete snapshot but the information could be
invaluable. How valuable is debatable but my point is that some "extra
data" per event is likely a good idea. And, for example, emails that
score really high on SA are something that could be weighted. I might
not even pay attention to the spam threshold as much as the spam score,
for example.
From RPs perspective, knowing that 1.2.3.4 is sending a LOT of emails
all marked 15 and higher by SA could give a lot more credibility than
marking a bunch of emails 1% over the threshold.
>> 2 - including virii/malware as a note
>
> Another event type for "virus or malware seen" might be a good
> addition, but I don't see any value in communicating back anything
> more detailed than that for calculating reputation. Differentiation
> between "virus" and other malware might be useful, too.
The virus type would be useful in identifying breakouts, etc. Again
though, this isn't a debate of the value of the data because that
shouldn't be a goal of the RFC. The goal is to provide something that is
a framework lots of people might use both as aggregators and sensors.
Towards that end, I would encourage RP to consider packaging the
aggregator code as well since it's my basic belief
>> 3 - dangerous attachments and a filename
>> 4 - dangerous content
>
> I guess the usefulness of this depends on the definition of
> "dangerous". What are you looking for here?
One example is a lot of emails that are phishing are sent with bad PDFs
and EXEs.
Dangerous content could refer to phishing attacks via social engineer
that don't have attachments. Perhaps something like the ClamAV Phishing
signatures.
>> 5 - reverse DNS failures
>
> This might be good, but handling transient failures due to local or
> upstream DNS issues vs. failure to configure rDNS for a host might be
> necessary.
IMO, you are debating what an aggregator should do with a data rather
than the process of sending / receiving the data. However, I think we
can agree that rDNS is an important component in the email ecosystem.
From RPs perspective, tracking senders that are consistently using
invalid rDNS especially reported by multiple sensors would lead to
valuable data especially if it occurred over a period of time suitable
to remove DNS outages from consideration.
>> 6 - improper HELO/EHLO statements
>
> This is probably a good one to add.
Hooray. We'll always have Paris.
Seriously though, please realize that this was my first pass at a
response to the RFC. I think you should poll for brainstorms on EVENTS
to consider. There have got to be a lot more I haven't thought
of/remembered/etc.
>> 7 - invalid MX records
>
> That's not terribly useful for a sending IP address, as there's no
> legitimate reason the sending IP needs to be an MX of the sender's
> domain.
While RP's use of the aggregated data is an IP-based index, others might
use it for a sending email address index, etc. But knowing that IP
1.2.3.4 sent me an email from a from address with an invalid MX record
(which includes checking A records, etc.) is quite useful in real-world
anti-spam.
>> I liked that in in #3 that REPUTATION database is not specific to
>> indexing by IPv4 or IPv6. The system should be extensible to report
>> more data such as the email address of the sender or recipient, the
>> subject of the email, etc. In theory, the system could even replace
>> Razor so it could include a hash of the email, etc. But I would
>> likely caveatthe first sentence with "index by IPv4 or IPv6 address
>> as oner example".
>
> That's probably a bit of scope creep. The idea here is that filters can
> communicate IP reputation information with a low-overhead UDP
> protocol. Sender address reputation might be worth investigating in a
> future iteration (the extensibility is there), but let's concentrate
> on the IP reputation case for now.
Sure, replacing Razor is feature creep so that's an extreme case. But
adding more data to the packet is likely necessary to make this more
extensible though I did scope it to fairly short bits of data like
to/from/subject and hash values.
Plus playing devil's advocate, the RFC says specifically the IP
reputation is NOT the only goal:
"Note that the exact format of the reputation database as well as what
constitutes "reputation" are beyond the scope of this document. We are
concerned only with a standard for reporting events."
So while I'm happy to address it more narrowly, my editorial feedback on
this version would be to remove that statement if it isn't your goal to
extend this beyond IP reputation.
> The use of port 6568 could be expanded to stated something like
> unless the AGGREGATOR utilizes an alternate port or something. I
> have other listeners on 6568 already, for example.
>
> Well, it's an RFC, so "SHOULD" pretty much covers that.
Agreed and I was happy you added the RFC-eeze description but it never
hurts to be explicitly flexible and even require that alternate ports be
possible.
> 4.2 would be best organized into 4.2.0 for reserved, 4.2.1 for
> GREYLISTED, etc. so that all event types have a clear report
> restriction. Then 4.2 should be restrictions for all events like IPv4
Makes sense though if you end up adding a bazillion more EVENT types,
grouping them could become troublesome. I was mostly looking for some
semblance of a 1:1 restriction for each EVENT type to help ensure that
an EVENT type isn't forgotten in the years to come.
> Does " a priori knowledge" mean something or is it a grammar/spelling
> issue?
>
> http://en.wikipedia.org/wiki/A_priori_and_a_posteriori#Use_of_the_terms
Thanks. I wasn't sure if there was some other meaning than who I read
it originally.
So knowing that, my underlying question is: What is the a reason that a
sensor should only send 492 bytes? Because I read the text it as "with
prior knowledge" which seems a fair paraphrase and that meant to me that
the very next statement constituted prior knowledge that the aggregator
has to accept larger than 492 bytes. In short, sentence one's caveat is
met by sentence two's caveat that the aggregator MUST handle reports
equal to or less than 65507, i.e. greater than 492 bytes. This
invalidates the need for sentence 1 completely which I imagine isn't
what you want.
>> I would include an extract definition of [GREY] in section 7 in
>> addition to the reference. It's a term that confuses a lot of
>> people that I discuss anti-spam with that aren't anti-spam researchers.
>
> Possibly a good idea, though I don't expect too many people who aren't
> involved in anti-spam activities will be interested in this RFC
Touche. I agree to this statement 100%. I forgot to consider the
audience.
Regards,
KAM
More information about the MIMEDefang
mailing list