Subject: Re: New system wedged again... And, more info.
To: Mason Loring Bliss <mason@acheron.middleboro.ma.us>
From: Brian Buhrow <buhrow@cats.ucsc.edu>
List: current-users
Date: 04/10/1999 00:19:41
	If you have some time after you notice the first wedged process before
the machine locks up solid, see if you can get the output of ps -l<pid>
where <pid> is the pid of the wedged process.  The field you're interested
in knowing about is the wchan field.  This will tell you where that process
is in the kernel and might give a clue as to the trouble.  It seems like
every process is piling up sleeping on some resource which is deadlocked.
Perhaps some disk i/o request, network cleaning (mbuf reclamation, timeo or
something like that.)  Post the output of this wchan output and I'll bet
someone has an idea.
-Brian

On Apr 9, 11:41pm, Mason Loring Bliss wrote:
} Subject: New system wedged again... And, more info.
} Hi again. The new system at work seems to have wedged again. This time it's
} definitely not RPC services in inetd.conf still being enabled or anything
} to do with hosts.allow not having sufficient permissions.
} 
} FWIW, the machine can respond to extended flood pings over a 10 megabit
} switch without dropping anything. It's simply not answering on any TCP
} ports.
} 
} Also, FWIW, I noticed something odd when I ran "uptime". It didn't return,
} and I couldn't kill it. It was like it had hung in the kernel somewhere.
} I tried "w" on another terminal, and that hung as well. I did "ps ax" on
} another terminal, and *that* hung. By this time response times were starting
} to slow. I typed "shutdown -r now" and before hitting enter popped over to
} a window that was running "top," but by that time it was all over, and I
} was left staring at a hung machine.
} 
} If it's put into service, the box will have to handle mad amounts of mail,
} most of it happening during hours when most folks in Africa, Europe, and
} further east are awake, and there won't be anyone around to reset the
} thing when it decides to wedge.
} 
} Any ideas? This is running off of the 1.4_ALPHA snapshot, and is still
} running the GENERIC kernel.
} 
} Thanks...
} 
} PS: I get the following when I try to ssh in from another box on the same
} network. The connection seems to be opening, but it's like nothing happens
} with it once it's open.
} 
} be /home/mason$ ssh -v satserv1
} SSH Version 1.2.25 [i386--netbsd], protocol version 1.5.
} Compiled with RSAREF.
} beastie.healthnet.org: Reading configuration data /etc/ssh_config
} beastie.healthnet.org: ssh_connect: getuid 1000 geteuid 1000 anon 1
} beastie.healthnet.org: Connecting to satserv1 [x.x.x.x] port 22.
} beastie.healthnet.org: Connection established.
} 
} -- 
} Mason Loring Bliss             ((  "In the drowsy dark cave of the mind dreams
} mason@acheron.middleboro.ma.us  ))  build  their nest  with fragments  dropped
} http://acheron.ne.mediaone.net ((   from day's caravan." - Rabindranath Tagore
} 
>-- End of excerpt from Mason Loring Bliss