[Mimedefang] long dns timeouts when first dns in /etc/resolv.conf is down

Bill Cole mdlist-20140424 at billmail.scconsult.com
Wed Mar 16 11:46:52 EDT 2016


On 14 Mar 2016, at 10:06, Dianne Skoll wrote:

> On Mon, 14 Mar 2016 14:11:38 +0100
> Marcus Schopen <lists at localguru.de> wrote:
>
>> It shouldn't make a difference to mimedefang if one of
>> the dns server is down. Any ideas?
>
> I think this is an artifact of the Net::DNS Perl module, which doesn't
> seem to handle multiple name servers very well.

The flaw is not intrinsic in Net::DNS.

Net::DNS has roughly the same tunables as the system resolver, reads 
resolv.conf to get values for what it does not receive in a RES_OPTIONS 
environment variable, lets you set them all explicitly, and ultimately 
uses the system defaults if they aren't set explicitly.

> I ran the following test program, where 10.50.100.100 is a nonexistent
> machine and 192.168.10.23 is the real name server.  Results of strace 
> are
> shown below; it seems by default that Net::DNS only moves to the next 
> name
> server after 10s.  If you do lots of DNS lookups, that can really
> slow things down.

Try it with a modern version of Net::DNS and see if that changes.

I haven't dug up the documentation, but on one older system with a 
"base" perl 5.10 & Net::DNS 0.65 it queries the nameservers list 
synchronously in series. If I use the perl 5.22 & Net::DNS 1.04 it seems 
to be querying all nameservers in the list somewhat asynchronously, in 
quasi-parallel. If the first one answers fast enough it never queries 
the second, but it's clearly not waiting around to exhaust all 
retries/retrans/timeouts. It is important to note that Net::DNS has also 
had some ugly compatibility problems from rapid and essentially untested 
change in the 1.0x line, but it seems to work fine with MD.

So, using a modified version of your script with the debug flag set and 
the resolver state printed, using 2 bogus nameservers and one that works 
(set via the RES_NAMESERVERS environment variable) here's the antique 
version:

# PATH=/usr/bin/:$PATH time -p /tmp/DiaNneStest.pl
0.65
;; RESOLVER state:
;;  domain       =
;;  searchlist   =
;;  nameservers  = 192.0.2.1 172.16.1.1 127.0.0.1
;;  port         = 53
;;  srcport      = 0
;;  srcaddr      = 0.0.0.0
;;  tcp_timeout  = 120
;;  retrans  = 5  retry    = 4
;;  usevc    = 0  stayopen = 0    igntc = 0
;;  defnames = 1  dnsrch   = 1
;;  recurse  = 1  debug    = 1
;;  force_v4 = 0  (IPv6 Transport is available)

;; query(colo3.roaringpenguin.com, A)
;; Trying to set up a AF_INET6() family type UDP socket with srcaddr: 
0.0.0.0 ... done
;; setting up an AF_INET() family type UDP socket
;; send_udp(192.0.2.1:53)
;; send_udp(172.16.1.1:53)
;; send_udp(127.0.0.1:53)
;; answer from 127.0.0.1:53 : 94 bytes
;; HEADER SECTION
;; id = 53885
;; qr = 1    opcode = QUERY    aa = 0    tc = 0    rd = 1
;; ra = 1    ad = 0    cd = 0    rcode  = NOERROR
;; qdcount = 1  ancount = 1  nscount = 2  arcount = 0

;; QUESTION SECTION (1 record)
;; colo3.roaringpenguin.com.	IN	A

;; ANSWER SECTION (1 record)
colo3.roaringpenguin.com.	80684	IN	A	70.38.112.54

;; AUTHORITY SECTION (2 records)
roaringpenguin.com.	21838	IN	NS	ns3.roaringpenguin.com.
roaringpenguin.com.	21838	IN	NS	ns4.roaringpenguin.com.

;; ADDITIONAL SECTION (0 records)

real        20.12
user         0.08
sys          0.02

=========================================================

Oh look, there seems to be a 10s timeout per bad server, even though the 
udp_timeout value isn't in that old version...

And here's with the perl that anything other than the base OS would use:


# time -p /tmp/DiaNneStest.pl
1.04
;; RESOLVER state:
;; domain	=
;; searchlist	=
;; nameservers	= 192.0.2.1 172.16.1.1 127.0.0.1
;; defnames	= 1	dnsrch		= 1
;; retrans	= 5	retry		= 4
;; recurse	= 1	igntc		= 0
;; usevc	= 0	port		= 53
;; srcaddr	= 0	srcport		= 0
;; tcp_timeout	= 120	persistent_tcp	= 0
;; udp_timeout	= 30	persistent_udp	= 0
;; debug	= 1	force_v4	= 0
;; prefer_v6	= 0	force_v6	= 0


;; query( colo3.roaringpenguin.com A )

;; udp send [192.0.2.1]:53

;; udp send [172.16.1.1]:53

;; udp send [127.0.0.1]:53

;; answer from [127.0.0.1] length 94
;; HEADER SECTION
;;	id = 31299
;;	qr = 1	aa = 0	tc = 0	rd = 1	opcode = QUERY
;;	ra = 1	z  = 0	ad = 0	cd = 0	rcode  = NOERROR
;;	qdcount = 1	ancount = 1	nscount = 2	arcount = 0
;;	do = 0

;; QUESTION SECTION (1 record)
;; colo3.roaringpenguin.com.	IN	A

;; ANSWER SECTION (1 record)
colo3.roaringpenguin.com.	80625	IN	A	70.38.112.54

;; AUTHORITY SECTION (2 records)
roaringpenguin.com.	21779	IN	NS	ns3.roaringpenguin.com.
roaringpenguin.com.	21779	IN	NS	ns4.roaringpenguin.com.

;; ADDITIONAL SECTION (0 records)

real         3.53
user         0.15
sys          0.02

============================================================

And since 3.53s is still an awfully long time to wait, let's not give 
anyone a second chance to answer a simple question:

# RES_OPTIONS="retry:0 retrans:0" time -p /tmp/DiaNneStest.pl
1.04
;; RESOLVER state:
;; domain	=
;; searchlist	=
;; nameservers	= 192.0.2.1 172.16.1.1 127.0.0.1
;; defnames	= 1	dnsrch		= 1
;; retrans	= 0	retry		= 0
;; recurse	= 1	igntc		= 0
;; usevc	= 0	port		= 53
;; srcaddr	= 0	srcport		= 0
;; tcp_timeout	= 120	persistent_tcp	= 0
;; udp_timeout	= 30	persistent_udp	= 0
;; debug	= 1	force_v4	= 0
;; prefer_v6	= 0	force_v6	= 0


;; query( colo3.roaringpenguin.com A )

;; udp send [192.0.2.1]:53

;; udp send [172.16.1.1]:53

;; udp send [127.0.0.1]:53

;; answer from [127.0.0.1] length 94
;; HEADER SECTION
;;	id = 48510
;;	qr = 1	aa = 0	tc = 0	rd = 1	opcode = QUERY
;;	ra = 1	z  = 0	ad = 0	cd = 0	rcode  = NOERROR
;;	qdcount = 1	ancount = 1	nscount = 2	arcount = 0
;;	do = 0

;; QUESTION SECTION (1 record)
;; colo3.roaringpenguin.com.	IN	A

;; ANSWER SECTION (1 record)
colo3.roaringpenguin.com.	80510	IN	A	70.38.112.54

;; AUTHORITY SECTION (2 records)
roaringpenguin.com.	21664	IN	NS	ns4.roaringpenguin.com.
roaringpenguin.com.	21664	IN	NS	ns3.roaringpenguin.com.

;; ADDITIONAL SECTION (0 records)

real         0.87
user         0.14
sys          0.02

============================================================

That's not so bad. ~5x slower than if I just use the working resolver, 
but tolerable.

> Regards,
>
> Dianne.
>
> #!/usr/bin/perl
> #### ns.pl test program
> use Net::DNS;
> use Net::DNS::Resolver;
> my $r = Net::DNS::Resolver->new(nameservers => ['10.50.100.100', 
> '192.168.10.23']);
> my $x = $r->query('colo3.roaringpenguin.com', 'A');

My derivative:

#!/usr/bin/env perl
  #### derived from Dianne Skoll's ns.pl test program
  use Net::DNS;
  use Net::DNS::Resolver;
  my $r = Net::DNS::Resolver->new();
  $r->debug(1);
  print Net::DNS->version, "\n";
  print  $r->string, "\n";
  my $x = $r->query('colo3.roaringpenguin.com', 'A');



More information about the MIMEDefang mailing list