current-users: Re: simple_lock: uninitialized lock

Subject: Re: simple_lock: uninitialized lock
To: Antti Kantee <pooka@cs.hut.fi>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: current-users
Date: 07/17/2007 16:10:40
On Tue, Jul 17, 2007 at 04:45:09PM +0300, Antti Kantee wrote:
> Heh, you forgot the bit about the princess and the evil witch ;)

I have a shiny red apple here :-)

> On a tangent off our main storyline, do you have ddb.onpanic set to 0?

good idea...

> Nick had the same problem and he managed to get a coredump.

I managed with reboot 0x104, but
...
#7  0xc0102dfd in calltrap ()
#8  0xc0492ff0 in strlen ()
Previous frame inner to this frame (corrupt stack?)

so can't see before the strlen... and didn't get the bt/l, but the
function calls where the same as before, and
db> show vnode/f ce7fe224
OBJECT 0xce7fe224: locked=239, pgops=0xdeadbeef, npages=-559038737, refs=-559038
737
  PAGES <pg,offset>:
uvm_fault(0xc05c3e00, 0xdeadb000, 1) -> 0xe
kernel: supervisor trap page fault, code=0

>  The return
> value (which I was hunting for with the printf) was EBUSY.  So seems
> like the problem is the following:
> 
> 1) When we get to vget(), VXLOCK is not set in the vnode.  Otherwise
>    vget() would return EBUSY
> 2) When we call vn_lock() in vget(), the VXLOCK flag is set:
>    LK_NOWAIT + VXLOCK = EBUSY return value.
>  OR
>    someone is holding a shared lock on the vnode
> 3) in VOP_LOCK() the vnode memory area is still alive.  Otherwise we
>    could not make the call through the vnode operations vector to
>    ufs_lock()
> 4) in ufs_lock() the vnode is dead: deadbeef arguments to lockmgr.
> 
> I'm very very very puzzled at what could be causing this.  In a biglock
> kernel it should not be possible at all, since I can't find anything
> along the paths that would sleep.  And even without biglock, getting
> two similar panics would require surgical timing.

From /var/log/messages:

Jul 17 13:16:59 tape2 ntpd[487]: kernel time sync status 2040
Jul 17 13:16:59 tape2 ntpd[487]: frequency initialized 16.814 PPM from /var/db/ntp.drift
Jul 17 13:16:59 tape2 ntpd[487]: running in unprivileged mode disables dynamic interface tracking
Jul 17 13:17:18 tape2 ntpd_initres[279]: parent died before we finished, exiting
Jul 17 13:17:37 tape2 /netbsd: vn_lock failed with 16
Jul 17 13:18:08 tape2 last message repeated 2 times
Jul 17 13:20:12 tape2 last message repeated 11 times
Jul 17 13:20:16 tape2 ntpd[487]: time reset +1.173950 s
Jul 17 13:20:16 tape2 ntpd[487]: kernel time sync status change 2001
Jul 17 15:49:20 tape2 syslogd: restart

So, lots of failed vn_lock - maybe try again without running ntpd...

Cheers,

Patrick