Subject: Re: process wedged in vnlock
To: Greg Oster <oster@cs.usask.ca>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 02/27/2007 14:57:58
Hi,

On Tue, Feb 27, 2007 at 08:39:46AM -0600, Greg Oster wrote:
> Tom Spindler writes:
> > Here's my backtrace for the same problem (thanks, t/t/l!)
> > 
> > trace: pid 22946  at 0xcb8ff6ec
> > sleepq_block(c03be130,14,cd56efa4,c034ad8e,0) at netbsd:sleepq_block+0x13e
> > ltsleep(cd56efa4,14,c034ad8e,0,cd56efa4) at netbsd:ltsleep+0xa0
> > acquire(0,600,c019bd13,cd56efa4,cac92028) at netbsd:acquire+0xd0
> > lockmgr(cd56efa4,10002,cd56ef34,c03baab4,cbc084a0) at netbsd:lockmgr+0x69f
> > ufs_lock(cb8ff850,0,0,c0225ffe,0) at netbsd:ufs_lock+0x2b
> > VOP_LOCK(cd56ef34,10002,cb8ff95c,c0199b7a,cd56ef34) at netbsd:VOP_LOCK+0x23
> > vn_lock(cd56ef34,10002,0,ffffffff,1691e) at netbsd:vn_lock+0x7c
> > vget(cd56ef34,10002,0,1000,ffffffff) at netbsd:vget+0x90
> > ufs_ihashget(4,1691e,0,2,c13dd204) at netbsd:ufs_ihashget+0x7e
> > ffs_vget(c13e2000,1691e,0,cb8ffa14,c13e2000) at netbsd:ffs_vget+0x2f
> > ufs_lookup(cb8ffa4c,10002,cbc084a0,cb8ffbec,c13e2000) at netbsd:ufs_lookup+0x
> > 7f6
> > VOP_LOOKUP(cbc084a0,cb8ffbd8,cb8ffbec,cbec3c00,20) at netbsd:VOP_LOOKUP+0x29
> > lookup(cb8ffbc8,20002,400,cb8ffbe0,c13e2000) at netbsd:lookup+0x1e7
> > namei(cb8ffbc8,805be58,64,0,3caef8) at netbsd:namei+0x131
> > sys__lstat30(other stuff I don't wanna type)
> > syscall_plain()
> > --- syscall (number 389) ---
>
> Are you folks on uni-processor or multi-processor machines?  
> I've been trying to cause a dual-core box to fall over, and have been 
> unable to so far... 

Likewise, but I will keep trying..

> If it's a uni-processor, perhaps this will help: 
>  
>  cvs rdiff -r1.4 -r1.5 src/sys/kern/kern_turnstile.c

That one shouldn't make a difference here, since it fixes a bug that was
introduced after the problem was initially reported. I'm wondering if this
is related to the ufs_ihash locking changes I made, but I don't see how 
just yet.

If the trace includes ufs_ihash* it would be good to see the output of
"ps/l", and note if anything is sleeping on "tstile" if this happens again..
The output of "show lock ufs_hashlock" would be useful here. If it shows
that the lock is currently held, you can get a backtrace from the owner
using "t/a <address of lwp>".

Cheers,
Andrew