[Mimedefang] URIBL/SURBL support

Joseph Brennan brennan at columbia.edu
Tue Nov 21 10:13:55 EST 2006



--On Monday, November 20, 2006 12:56 -0800 Kelsey Cummings 
<kgc at corp.sonic.net> wrote:

> Has anyone written up generic URIBL or SURBL specific support for MD
> outside of using SpamAssassin?  Mind sharing?


First you have to parse the URL out of text.  That's fun.  This
works most of the time for plain text parts.

if (/http:..([a-zA-Z0-9\.-]+)/) {
      my $uri = $1;
      my $result = surblcheck($uri);
      if ($result) {
  	 # do something
      }
}

For a message sent html-only, you'd want to use HTML::Parser to decode
and extract the uri from the creative obfuscations.  Not shown here.

The function surblcheck() is probably what you are asking for, and it
looks like this.  We arranged to get the zone files so this is a local
lookup for us.

You might parse the last component of the 127.0.0.n result to see which
surbl list(s) matched, but we just test whether we got any result.


#***********************************************************************
# %PROCEDURE: surblcheck
# %ARGUMENTS:
#  uri - a hostname or an IP address
# %RETURNS:
#  ret - The IP address in the SURBL database, or 0 if not found.
# %DESCRIPTION
#  Request a SURBL check by hostname.
#  The returned IP address is 127.0.0.n where n is a bitmasked
#  representation of which component lists matched.  See
#  www.surbl.org/lists.html under Combined SURBL list.
#***********************************************************************
sub surblcheck {
    my ($uri) = @_;
    my ($ret, $domainname);

    # case of a hostname
    $domainname = getdomain($uri);

    # case of an IP address
    if ($uri =~ /^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$/) 
{
        $domainname = "$4.$3.$2.$1";
    }

    # case of neither hostname nor IP address
    return 0 unless ($domainname);

    # format
    $domainname = $domainname . ".multi.surbl.org";

    # check it
    my $resolver = new Net::DNS::Resolver;
    $resolver->tcp_timeout(10);
    $resolver->port(530);
    my $query = $resolver->query($domainname);

    # $query will be 0 if not in surbl
    return 0 unless defined $query;

    # should be only one record returned, but ...
    foreach my $rr ($query->answer) {
        if( $rr->type eq 'A' ) {
            $ret = $rr->address();
        }
    }
    $ret = 0 unless ($ret);
    return $ret;
}


***********************************************************************
# %PROCEDURE: getdomain
# %ARGUMENTS:
#  hostname
# %RETURNS:
#  The domain part of the hostname in lowercase, or 0 on failure
#***********************************************************************
sub getdomain {
    my ($hostname) = @_;
    my $ret = 0;

    $hostname = lc($hostname);

    # A hostname may contain only letters, numbers, hyphens, dots
    if ($hostname !~ /^[a-z0-9\.-]+$/) {
        return $ret;
    }

    my @parts = split(/\./,$hostname);

    # The top-level domain must be alphabetic
    if ($parts[-1] !~ /^[a-z]+$/) {
        return $ret;
    }

    # Now, let's find the significant domain parts
    # The 2-letter country codes
    if (length($parts[-1]) == 2) {
        # Brief heuristic for when they are 3 parts (not complete)
        if ($parts[-2] =~ /(ac|co|com|edu|gov|info|mil|net|nom|org)/) {
            $ret = $parts[-3] . "." . $parts[-2] . "." . $parts[-1];
        }
        else {
            $ret = $parts[-2] . "." . $parts[-1];
        }
    }
    # The old 3-letter tlds and the newer long tlds
    else {
        $ret = $parts[-2] . "." . $parts[-1];
    }

    return $ret;
}

#***********************************************************************


Scary, right?  But this identifies a lot of spam.

Joseph Brennan
Lead Email Systems Engineer
Columbia University Information Technology






More information about the MIMEDefang mailing list