Subject: Fw: X Windows hang with NetBSD 1.4.1 (i386)
To: None <tech-kern@netbsd.org>
From: Trevin Beattie <trevin@xmission.com>
List: tech-kern
Date: 10/21/1999 07:39:41
>Date: Mon, 27 Sep 1999 18:21:46 +0200
>From: Manuel Bouyer <bouyer@antioche.lip6.fr>
>To: Trevin Beattie <trevin@xmission.com>
>Subject: Re: X Windows hang with NetBSD 1.4.1 (i386)
>References: <3.0.3.32.19990920062540.006a73bc@192.42.172.103>
<3.0.3.32.19990919091055.006b7718@192.42.172.103>
<3.0.3.32.19990919091055.006b7718@192.42.172.103>
<19990920112939.A4523@antioche.lip6.fr>
<3.0.3.32.19990920062540.006a73bc@192.42.172.103>
<19990924180203.D5895@antioche.lip6.fr>
<3.0.3.32.19990925105402.006a7094@192.42.172.103>
>
...
>
>All this is interesting. Could you post this to tech-kern, with as much
>as details as possible ? I'm not familiar with the vfs, but peoples working
>on it will certainly be interested !
>
>--
>Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
>--

Here's my latest hangup:

I was logging into xdm, it removed the login window and started my
.xsession script, but all I saw was a background pattern and cursor change
when the system stopped doing anything more.

So I shut down X with Ctrl-Alt-Backspace, and of course xdm can't restart
it because of the deadlock situation.  Then I could enter the debugger with
Ctrl-Alt-Esc.

"ps /w" shows the following processes:
PID  PPID  COMMAND PRI WAIT-MSG   WAIT-CHANNEL
1864    1    xntpd  40 pause      0xf4bab198
1857 1833   xearth   8 inode      0xf4af2820
1855 1833      mwm   8 inode      0xf4af2538
1833  287       sh  32 wait       0xf4b08d6c
1811    1     pppd  24 select     _selwait
295     1     init  25 ttyin      0xf4963008
290     1     smbd  24 select     _selwait
288     1     nmbd  24 select     _selwait
287   277      xdm  32 wait       0xf4b5c140
283     1    sshd2  24 select     _selwait
277     1      xdm   8 inode      0xf4af2538
262     1    inetd  24 select     _selwait
259     1     apmd  24 select     _selwait
257     1      lpd  24 select     _selwait
255     1   routed  24 select     _selwait
253     1     cron  32 nanosleep  _nanowait.174
251     1   update  40 pause      0xf4b39198
211     1   nfsiod  32 nfsidl     _nfs_iodwant+0xc
210     1   nfsiod  32 nfsidl     _nfs_iodwant+0x8
208     1   nfsiod  32 nfsidl     _nfs_iodwant+0x4
207     1   nfsiod  32 nfsidl     _nfs_iodwant
199   195     nfsd  24 nfsd       0xf0550800
198   195     nfsd  24 nfsd       0xf0550a00
197   195     nfsd  24 nfsd       0xf0550c00
196   195     nfsd  24 nfsd       0xf0550e00
195     1     nfsd  24 netcon     0xf04f288a
187     1   mountd  24 select     _selwait
169     1  portmap  24 select     _selwait
167     1    named  24 select     _selwait
163     1  syslogd  24 select     _selwait
3       0   reaper   4 reaper     _deadproc
2       0 pagedaemon 4 daemon_slp _uvm+0x30
1       0     init  32 wait       0xf49bc000
0      -1  swapper   4 scheduler  _proc0


From the wait channels, it appears that the system is deadlocked somewhere
in the filesystem.  So here's a partial backtrace of the processes which
are waiting on "inode", and the filenames found in the parameters to lookup():

xearth (pid 1857):
_lockmgr(f4af2820,30002,f4aeb914,f4b88ddc,f015ea9f) at _lockmgr+0x36f
_ufs_lock(f4b88d0) at _ufs_lock+022
_vn_lock(f4aeb888,20002,f0513c00,0,f4b88f18) at _vn_lock+0x5f
_union_root(f0513c00,f4b88e40,f49b81d4,f4b88f18,f4b88ef4) at _union_root+0x5c
_lookup(f4b88ef4,f4b88f88,f4b5cd70,f056cb80,f057a300) at _lookup+0x370
_namei(f4b88ef4,f4b88f88,f4b5cd70,f4b88f80,64) at _namei+0x317
_sys_access(f4b5cd70,f4b88f88,f4b88f80,0,0) at _sys_access+0x58
 (lib/X11/app-defaults/XEarth-color)

mwm (pid 1855):
_lockmgr(f4af2538,30002,f4aeb740,f4b7ed40,f015ea9f) at _lockmgr+0x36f
_ufs_lock(f4b7ed34) at _ufs_lock+0x22
_vn_lock(f4aeb6b4,20002,f4b7ef04,3,f4b7ef18) at _vn_lock+0x5f
_union_lookup(f4b7ee44,f49b81d4,f4b7ef18,f4b7eef4,0) at _union_lookup+0x208
_lookup(f4b7eef4,f4b7ef88,f4b5cb00,f056cb80,f056e300) at _lookup+0x248
_namei(f4b7eef4,f4b7ef88,f4b5cb00,f4b7ef80,64) at _namei+0x317
_sys_access(f4b5cb00,f4b7ef88,f4b7ef80,0,0) at _sys_access+0x58
 (bitmaps/_XmScrollBarUnavailableStipple)

xdm (pid 277):
_lockmgr(f4af2538,10002,f4aeb740,f4b4ecec,f015ea9f) at _lockmgr+0x36f
_ufs_lock(f4b4ece0) at _ufs_lock+0x22
_vn_lock(f4aeb6b4,10002,f4aeb6b4,ffffffff,f4b4edd0) at _vn_lock+0x5f
_vget(f4aeb6b4,2,f4b4eea4,3,f4b4eeb8) at _vget+0x73
_ufs_lookup(f4b4ee04,f49b81d4,f4b4eeb8,f4b4ee94,0) at _ufs_lookup+0x1f3
_lookup(f4b4ee94,f4b4ef88,f49bcd68,f4b4ef88,f055ac00) at _lookup+0x248
_namei(f4b4ee94,f4b4ef88,f49bcd68,f4b4ef80,20a00) at _namei+0x317
_sys___stat13(f49bcd68,f4b4ef88,f4b4ef80,0,11a) at _sys___stat13+0x44
 (lib/X11/xdm/xdm-config)

I am posting the kernel image and core dump to
http://www.xmission.com/~trevin/images/netbsd.14.gz and
http://www.xmission.com/~trevin/images/netbsd.14.core.gz

-----------------------
Trevin Beattie          "Do not meddle in the affairs of wizards,
trevin@xmission.com     for you are crunchy and good with ketchup."
      {:->                                     --unknown