current-users: /etc/netstart timing out

Subject: /etc/netstart timing out
To: None <current-users@NetBSD.ORG>
From: Greg Hudson <ghudson@MIT.EDU>
List: current-users
Date: 08/09/1995 07:05:01
There's a case where /etc/netstart will time out on a hostname lookup.
To describe how this happens, I need to give a bit of background on
/etc/netstart, the resolver, and kernel semantics for UDP error
delivery.

First, let me describe the relevant parts of netstart by taking an
example which exhibits the bad behavior.  The actual case where I ran
into this problem was a machine frobnitz.mit.edu (IP address
18.177.0.123), a machine with one ethernet interface, le0.  frobnitz
is configured to use named, but also has two name servers to fall back
on if named dies, so its /etc/resolv.conf is:

	domain MIT.EDU
	nameserver 127.0.0.1
	nameserver 18.70.0.160
	nameserver 18.71.0.151

frobnitz's /etc/hosts file contains appropriate entries for localhost,
localhost.mit.edu, frobnitz, and frobnitz.mit.edu.

In this configuration, /etc/netstart will run following commands:

	hostname frobnitz
	ifconfig le0 frobnitz.mit.edu inet 18.177.0.123
	ifconfig lo0 inet localhost
	route add frobnitz localhost
	route add default 18.177.0.1

Naturally, named isn't running when /etc/netstart runs.

Second, the resolver in the C library.  The resolver works by cycling
through each of the three name servers some number of times with a
varying timeout for each iteration.  If there is only one nameserver,
or if the resolver is on the first try, the resolver will use
connect() and send() so that it gets ICMP errors from the remote host.
If the resolver is not on the first try and there is more than one
name server, the resolver will disconnect (connect to the zero
address) if it was connected, and use sendto() so that the resolver
can receive replies from any of the name servers it has sent to.  If
the resolver fails to resolve a host using the nameservers, it will
fall back to /etc/hosts.

Third, the kernel semantics.  The kernel will return from sendto()
with an error value if you try to send to an address you don't have a
route to.  If you send to an address you have a route to but to a port
where there's no server running, the kernel will return successfully
from sendto() even if you're sending on the loopback interface.  The
kernel will then usually receive an ICMP port unreachable (or fake
one, on the loopback interface), and deliver it to the sending process
(by waking it up from a select() and returning a connection refused
error on the next receive operation) if only if the sending process is
connect()ed to the unreachable address and port.

The timeout happens in this case during "route add frobnitz
localhost".  The order of operations in the resolver is this:

	Nameserver 1 try 1:
		Connect to 127.0.0.1
		Send request, successfully
		select(), immediately wake up due to ICMP port unreachable
		recv(), get -1 with errno == ECONNREFUSED

	Nameserver 2 try 1:
		Connect to the zero address
		Send request, get -1 with errno == ENETUNREACH

	Nameserver 3 try 1:
		Send request, get -1 with errno == ENETUNREACH

	Nameserver 1 try 2:
		Send request, successfully
		select(); ICMP port unreachable not delivered, timeout

The timeout does not occur during "ifconfig lo0 inet localhost"
because there's no route to 127.0.0.1 at the time when localhost is
resolved, so the resolver falls back on /etc/hosts immediately after
failing all the sendto()s.

Clearly, the resolver is not behaving optimally here, and the only way
it can behave optimally is to use multiple sockets.  (Linux kernel
semantics for ICMP port unreachable errors would sort of help here,
but the resolver would still not be completely robust because you
can't know *which* name server you got an ICMP port unreachable from.)
The latest bind 4.9.3 beta modifies the res_send() semantics slightly,
but would still fail in the case I outlined above.  I don't think
there's a good solution to this problem, except perhaps to check
whether the draft IP6 APIs suffer from the same inadequacy with regard
to UDP errors as the current API does.

The simplest solution is to nuke the "route add $hostname localhost"
command from /etc/netsatrt, since the kernel is already smart enough
to automatically add routes through the loopback interface for network
interface addresses.  I'll check in this change in a few days if there
are no particularly enlightening comments.