[Mimedefang] Return-Path: <hw2179 at columbia.edu> Received: from murder (soyloaf-eth1.cc.columbia.edu [128.59.33.163]) by liverwurst.cc.columbia.edu (Cyrus v2.3-alpha) with LMTPSA (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256/256 verify=YES); Wed, 08 Mar 2006 18

Joseph Brennan brennan at columbia.edu
Thu Mar 9 12:29:22 EST 2006



--On Thursday, March 9, 2006 12:04 -0500 Josh Kelley <joshkel at gmail.com> 
wrote:

> I'm interested in gathering statistics on which MUAs our users use (so
> we can find out what mail clients are popular enough to officially
> support, which old clients we can drop support for, who's using old
> versions and should be gently encouraged to upgrade, etc.).  I figure
> that MIMEDefang can track this by grabbing the X-Mailer: header of
> messages as they go through, and I thought that I'd ask the list to
> see if anyone's already done this before I go write some code.


Yes, we have!

First you open HEADERS and grab the string in the X-Mailer and
User-Agent headers, if they exist.  Most clients identify themselves
in one of those two headers.  Store in $xmailer and $useragent.

(We test various header fields, so we open HEADER once and store
a set of values to be tested afterwards.  Maybe I should use an
array but because of how this grew it's just numerous variables.)

We require smtp authentication, so we can identify our own users by
whether they used it.  This includes people who use some other
address as sender, and excludes spammers who fake the sender.

The sampling code itself is just this.

# client sampling
if (defined($SendmailMacros{"auth_type"}) )  {
    my($client) = "unknown";
    if ($xmailer) {
        chomp($xmailer); $client = $xmailer;
    }
    elsif ($useragent) {
        chomp($useragent); $client = $useragent;
    }
    syslog(LOG_INFO,
    "Client,$RelayAddr,uni=$SendmailMacros{'auth_authen'},$client");
}

Then just grep syslog lines with ',Client,' in them and add how
often each client appears.  That tells you how many messages were
sent per client-- which is almost right.  However you might have
someone sending 100 a day with client A and 100 people sending 1
a day with client B.  That isn't really the same thing.

Note we also record what "uni" (user) it was.  One of my colleagues
here feeds the data into SAS and gets us counts of how many different
*people* use each client, and then by matching to databases, how many
faculty use each client, how many students in each division use each
client, etc.  It's interesting stuff that we use to get budget money.
Don't ask me how SAS works though.

Joe Brennan









More information about the MIMEDefang mailing list