Subject: Re: simple_lock: uninitialized lock
To: Patrick Welche <prlw1@newn.cam.ac.uk>
From: Antti Kantee <pooka@cs.hut.fi>
List: current-users
Date: 07/17/2007 16:45:09
On Tue Jul 17 2007 at 12:57:06 +0100, Patrick Welche wrote:
> On Mon, Jul 16, 2007 at 10:23:18PM +0300, Antti Kantee wrote:
> > On Mon Jul 16 2007 at 19:31:52 +0100, Patrick Welche wrote:
> > > Finally a panic while I'm not in X... Today's CVS on i386:
> > 
> > Sounds like this is not the pilot or even first season.  Got a pointer
> > to earlier episodes I've missed?
> 
> Not really - I never had anything to go on! If you would like the story :-)
> There was that post about firefox being "parked", so I compiled a new
> kernel on a single processor amd, and thereafter it kept freezing while
> building the world, and I had some interesting file corruption e.g.
> .depend files containg 0x0 or some binary. Then I came back to Cambridge
> and had similar freezing experiences on a pentium 4 with hyperthreading
> enabled, once again while building the world and in X. Then I tried
> building the world from a console on a pentium M, that's when I finally
> had the panic, and could get some information out!

Heh, you forgot the bit about the princess and the evil witch ;)

On a tangent off our main storyline, do you have ddb.onpanic set to 0?
It should dump core instead of freezing.  Also, you should be able to type
"sync" blind even if you are in X.

> Pleasant surprise: bt/l actually did write to disk, so
> now with function arguments:
> 
> strlen(c0531a16,8,cbfb08c8,cbfb08ee,cbfb099c) at netbsd:strlen+0x8
> vsnprintf(cbfb08ee,96,c0531a16,cbfb0998,0) at netbsd:vsnprintf+0x42
> lock_printf(c0531a16,deadbeef,ffffbeef,b8,c) at netbsd:lock_printf+0x5f
> _simple_lock(ce707ef8,c0532945,b8,c05355dd,c059e534) at netbsd:_simple_lock+0x8c
> ltsleep(ce707ef8,14,c05355dd,0,ce707ef8) at netbsd:ltsleep+0x18f
> acquire(0,600,c02684ef,c02ce028,cacb3028) at netbsd:acquire+0xb0
> _lockmgr(ce707ef8,10002,ce707e68,c056b55c,8fd) at netbsd:_lockmgr+0x507
> ufs_lock(cbfb0ae0,0,afc,c04bcb60,ce707e68) at netbsd:ufs_lock+0x4f
> VOP_LOCK(ce707e68,10002,cbfb0b2c,c04bcb60,ce707e68) at netbsd:VOP_LOCK+0x25
> vn_lock(ce707e68,10002,50a,10012,10) at netbsd:vn_lock+0x9f
> vrele(ce707e68,10012,4ab,c032bf87,cc6acae8) at netbsd:vrele+0x130
> vget(ce707e68,10012,544,0,0) at netbsd:vget+0x90
> ffs_sync(c13e5000,3,cad40ee0,cad4c1c0,d) at netbsd:ffs_sync+0xde
> sync_fsync(cbfb0c08,10012,cbfb0c2c,c0336a7f,cc02b118) at netbsd:sync_fsync+0xaa
> VOP_FSYNC(cc02b118,cad40ee0,8,0,0) at netbsd:VOP_FSYNC+0x49
> sched_sync(cad4c1c0,0,c01002ac,fbff,c01002ac) at netbsd:sched_sync+0xf5
> 
> The kernel dump I got isn't much use because I got another panic while
> synching.
> 
> Now trying again with your patch...

Nick had the same problem and he managed to get a coredump.  The return
value (which I was hunting for with the printf) was EBUSY.  So seems
like the problem is the following:

1) When we get to vget(), VXLOCK is not set in the vnode.  Otherwise
   vget() would return EBUSY
2) When we call vn_lock() in vget(), the VXLOCK flag is set:
   LK_NOWAIT + VXLOCK = EBUSY return value.
 OR
   someone is holding a shared lock on the vnode
3) in VOP_LOCK() the vnode memory area is still alive.  Otherwise we
   could not make the call through the vnode operations vector to
   ufs_lock()
4) in ufs_lock() the vnode is dead: deadbeef arguments to lockmgr.

I'm very very very puzzled at what could be causing this.  In a biglock
kernel it should not be possible at all, since I can't find anything
along the paths that would sleep.  And even without biglock, getting
two similar panics would require surgical timing.

-- 
Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
http://www.iki.fi/pooka/                          http://www.NetBSD.org/
    "la qualité la plus indispensable du cuisinier est l'exactitude"