[Mimedefang] URIBL/SURBL support
Joseph Brennan
brennan at columbia.edu
Tue Nov 21 10:13:55 EST 2006
--On Monday, November 20, 2006 12:56 -0800 Kelsey Cummings
<kgc at corp.sonic.net> wrote:
> Has anyone written up generic URIBL or SURBL specific support for MD
> outside of using SpamAssassin? Mind sharing?
First you have to parse the URL out of text. That's fun. This
works most of the time for plain text parts.
if (/http:..([a-zA-Z0-9\.-]+)/) {
my $uri = $1;
my $result = surblcheck($uri);
if ($result) {
# do something
}
}
For a message sent html-only, you'd want to use HTML::Parser to decode
and extract the uri from the creative obfuscations. Not shown here.
The function surblcheck() is probably what you are asking for, and it
looks like this. We arranged to get the zone files so this is a local
lookup for us.
You might parse the last component of the 127.0.0.n result to see which
surbl list(s) matched, but we just test whether we got any result.
#***********************************************************************
# %PROCEDURE: surblcheck
# %ARGUMENTS:
# uri - a hostname or an IP address
# %RETURNS:
# ret - The IP address in the SURBL database, or 0 if not found.
# %DESCRIPTION
# Request a SURBL check by hostname.
# The returned IP address is 127.0.0.n where n is a bitmasked
# representation of which component lists matched. See
# www.surbl.org/lists.html under Combined SURBL list.
#***********************************************************************
sub surblcheck {
my ($uri) = @_;
my ($ret, $domainname);
# case of a hostname
$domainname = getdomain($uri);
# case of an IP address
if ($uri =~ /^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$/)
{
$domainname = "$4.$3.$2.$1";
}
# case of neither hostname nor IP address
return 0 unless ($domainname);
# format
$domainname = $domainname . ".multi.surbl.org";
# check it
my $resolver = new Net::DNS::Resolver;
$resolver->tcp_timeout(10);
$resolver->port(530);
my $query = $resolver->query($domainname);
# $query will be 0 if not in surbl
return 0 unless defined $query;
# should be only one record returned, but ...
foreach my $rr ($query->answer) {
if( $rr->type eq 'A' ) {
$ret = $rr->address();
}
}
$ret = 0 unless ($ret);
return $ret;
}
***********************************************************************
# %PROCEDURE: getdomain
# %ARGUMENTS:
# hostname
# %RETURNS:
# The domain part of the hostname in lowercase, or 0 on failure
#***********************************************************************
sub getdomain {
my ($hostname) = @_;
my $ret = 0;
$hostname = lc($hostname);
# A hostname may contain only letters, numbers, hyphens, dots
if ($hostname !~ /^[a-z0-9\.-]+$/) {
return $ret;
}
my @parts = split(/\./,$hostname);
# The top-level domain must be alphabetic
if ($parts[-1] !~ /^[a-z]+$/) {
return $ret;
}
# Now, let's find the significant domain parts
# The 2-letter country codes
if (length($parts[-1]) == 2) {
# Brief heuristic for when they are 3 parts (not complete)
if ($parts[-2] =~ /(ac|co|com|edu|gov|info|mil|net|nom|org)/) {
$ret = $parts[-3] . "." . $parts[-2] . "." . $parts[-1];
}
else {
$ret = $parts[-2] . "." . $parts[-1];
}
}
# The old 3-letter tlds and the newer long tlds
else {
$ret = $parts[-2] . "." . $parts[-1];
}
return $ret;
}
#***********************************************************************
Scary, right? But this identifies a lot of spam.
Joseph Brennan
Lead Email Systems Engineer
Columbia University Information Technology
More information about the MIMEDefang
mailing list