Subject: Strange hang of services
To: None <netbsd-help@netbsd.org>
From: Norbert Elbrecht <netbsd-help@elbrecht.com>
List: netbsd-help
Date: 02/07/2001 16:31:08
Hi all,

I just had to go through obscure hang of my Internet connection (DSL
dialup). While surfing the net Netscape complained about host lookups. I
then found a hanging BIND of NetBSD 1.5 system. I tried to restart and
faced it was "hard to kill" it. New BINDs were hanging - sending them
successive signals got it from one hanging point to the next (according
to <named -d x -f -c ...> screen output).

One DNS of my ISP was hanging, which was my BIND forwarder, other
forwarders I had were obviously not used. In between I changed
/etc/resolv.conf to another DNS and could surf again, but logging in was
still very slow (timeout, su did not proceed at all after my
password). So the same went on with bind9 of pkgsrc.  Similar went with
ipnat/ipf, network and finally syslog.

Syslog was hard to kill (kill -9 did it), I restarted (which was logged)
and tried

logger some thing # which was not logged

Three more restarts and it ran again. Then logger stuff was logged. Now
BIND is happyly logging (and working), my guesswork is:

BIND tried to allocate resources (tcp sockets mem file etc) which it
could not get, but did not tell me either. I played with netstat -a,
killed connections, processes, network services and more, but did not
come to smart conclusions. Which leaves me with three questions:

	1) Why does su hang with no bind when lookups are "files, dns"
           in resolv.conf, nsswitch* and host.conf?

	2) How would You guys try to get info about resources namely
           open files, sockets, connections (did I forget anything)? Are
           there realtime tools like strace under linux?

	3) I had the idea to kill stuff step by step and "cleanup". I
           found open connections, killed processes, but netstat -a
           still listed that specific stuff. What are the resource wise
           cleanup instructions You can think of?

Obviously dmesg and /var/log/* did not help much here.

Please flame me, send URLs or your troubleshooting-dirty-trick-survey I
will post a summary (flames are only counted ;-).

cu Norbert
-- 
Today is the first day of the rest of your life.
	pub 1024D/94482C11 1999-11-15, fingerprint 
	1323 01EC 7007 AB62 8ACD  92E3 65B2 F1C2 9448 2C11