Subject: Re: process wedged in vnlock
To: None <current-users@netbsd.org>
From: Tom Spindler <dogcow@babymeat.com>
List: current-users
Date: 02/27/2007 15:22:10
Here's the results. (I had to boot into multiuser mode, which was kind
of annoying; didn't seem to work in singleuser.)

db> ps/l
  PID         LID S     FLAGS       STRUCT LWP *            UAREA * WAIT        
 116           1 3 0x1000004         0xceff11c0         0xcd22ece0 vnlock
 1054          1 3      0x84         0xceff1340         0xcd04ece0 pause
 977           1 3      0x84         0xceff1ac0         0xcc3aece0 ttyin
 855           1 3      0x84         0xcb1b3020         0xcc65ece0 ttyin
 1039          1 3      0x84         0xcb1b34a0         0xcbe6ece0 wait
 946           1 3      0x84         0xcb1b37a0         0xcbca8ce0 ttyin
 1013          1 3      0x84         0xceff14c0         0xcce7ece0 nanoslp
 1050          1 3      0x84         0xceff17c0         0xccc1ece0 select
 1023          1 3      0x84         0xceff1640         0xccd0ece0 select
 952           1 3      0x84         0xceff1940         0xcc82ece0 select
 764           1 3      0x84         0xceff1dc0         0xcc74ece0 select
 726           1 3      0x84         0xcb1b31a0         0xcc59ece0 pause
 435           1 2       0x4         0xcb1b3320         0xcc4aece0 
 349           1 3      0x84         0xcb1b3620         0xcbcacce0 select
 91            1 3     0x204         0xceff1c40         0xcc98ece0 physiod
 14            1 3     0x204         0xcb1b3920         0xcbca4ce0 aiodoned
 13            1 3     0x204         0xcb1b3aa0         0xcbca1ce0 syncer
 12            1 3     0x204         0xcb1b3c20         0xcbc9ece0 pgdaemon
 11            1 3     0x204         0xcb1b3da0         0xcbc99ce0 sccomp
 10            1 3     0x204         0xcb1ac000         0xcbc93ce0 apmev
 9             1 3     0x284         0xcb1ac180         0xcbc90ce0 fwprobe
db> t/a 0xceff11c0
trace: pid 116 lid 1 at 0xcd22e9fc
sleepq_switch(0,0,cd1b7c74,c0363964,0) at netbsd:sleepq_switch+0x53
ltsleep(cd1b7c74,14,c0363964,0,cd1b7c74) at netbsd:ltsleep+0x13b
acquire(0,40500,c019de46,0,74) at netbsd:acquire+0x104
_lockmgr(cd1b7c74,10002,cd1b7bec,c0386930,937) at netbsd:_lockmgr+0x9bb
ufs_lock(cd22eb60,cd22eb94,cf0b2334,cd22ebb8,0) at netbsd:ufs_lock+0x3a
VOP_LOCK(cd1b7bec,10002,2b7,0,5) at netbsd:VOP_LOCK+0x23
vn_lock(cd1b7bec,20002,200,0,cf0b3a88) at netbsd:vn_lock+0x96
vn_close(cd1b7bec,5,cf0b2334,ceff11c0,ceff11c0) at netbsd:vn_close+0x21
vn_closefile(cf0b3a88,ceff11c0,5b3,0,0) at netbsd:vn_closefile+0x1a
closef(cf0b3a88,ceff11c0,cd22ec68,cd22ece0,8054000) at netbsd:closef+0x17c
syscall_plain() at netbsd:syscall_plain+0xb4
--- syscall (number 6) ---
0xbbb191fb:
db> show lock ufs_hashlock
lock address : 0x00000000c03d5cd4 type     :     sleep/adaptive
shared holds :                  0 exclusive:                  0
shares wanted:                  0 exclusive:                  0
current cpu  :                  0 last held:                  0
current lwp  : 000000000000000000 last held: 000000000000000000
last locked  : 0x00000000c01954da unlocked : 0x00000000c0195586

Turnstile chain at 0xc03dc200 with tc_mutex at 0xc03dc220.
=> No active turnstile for this lock.
db> x c01954da
netbsd:ffs_vget+0x96:	75ff026a    
db> x c0195586
netbsd:ffs_vget+0x142:	8bd85d8b    


On Tue, Feb 27, 2007 at 01:19:24PM -0800, Tom Spindler wrote:
> > Are you folks on uni-processor or multi-processor machines?  
> > I've been trying to cause a dual-core box to fall over, and have been 
> > unable to so far... 
> 
> Uniprocessor.
> 
> >  cvs rdiff -r1.4 -r1.5 src/sys/kern/kern_turnstile.c
> 
> Nope.
> 
> > If the trace includes ufs_ihash* it would be good to see the output of
> > "ps/l", and note if anything is sleeping on "tstile" if this happens again..
> > The output of "show lock ufs_hashlock" would be useful here. If it shows
> > that the lock is currently held, you can get a backtrace from the owner
> > using "t/a <address of lwp>".
> 
> I'll try that when I get home.
> 
> As it happens, I can immediately cause it to happen within about
> ten seconds by doing "find . -type d -print >/dev/null".
>