Subject: Re: simple_lock: uninitialized lock
To: Antti Kantee <pooka@cs.hut.fi>
From: Patrick Welche <prlw1@newn.cam.ac.uk>
List: current-users
Date: 07/17/2007 16:10:40
On Tue, Jul 17, 2007 at 04:45:09PM +0300, Antti Kantee wrote:
> Heh, you forgot the bit about the princess and the evil witch ;)
I have a shiny red apple here :-)
> On a tangent off our main storyline, do you have ddb.onpanic set to 0?
good idea...
> Nick had the same problem and he managed to get a coredump.
I managed with reboot 0x104, but
...
#7 0xc0102dfd in calltrap ()
#8 0xc0492ff0 in strlen ()
Previous frame inner to this frame (corrupt stack?)
so can't see before the strlen... and didn't get the bt/l, but the
function calls where the same as before, and
db> show vnode/f ce7fe224
OBJECT 0xce7fe224: locked=239, pgops=0xdeadbeef, npages=-559038737, refs=-559038
737
PAGES <pg,offset>:
uvm_fault(0xc05c3e00, 0xdeadb000, 1) -> 0xe
kernel: supervisor trap page fault, code=0
> The return
> value (which I was hunting for with the printf) was EBUSY. So seems
> like the problem is the following:
>
> 1) When we get to vget(), VXLOCK is not set in the vnode. Otherwise
> vget() would return EBUSY
> 2) When we call vn_lock() in vget(), the VXLOCK flag is set:
> LK_NOWAIT + VXLOCK = EBUSY return value.
> OR
> someone is holding a shared lock on the vnode
> 3) in VOP_LOCK() the vnode memory area is still alive. Otherwise we
> could not make the call through the vnode operations vector to
> ufs_lock()
> 4) in ufs_lock() the vnode is dead: deadbeef arguments to lockmgr.
>
> I'm very very very puzzled at what could be causing this. In a biglock
> kernel it should not be possible at all, since I can't find anything
> along the paths that would sleep. And even without biglock, getting
> two similar panics would require surgical timing.
From /var/log/messages:
Jul 17 13:16:59 tape2 ntpd[487]: kernel time sync status 2040
Jul 17 13:16:59 tape2 ntpd[487]: frequency initialized 16.814 PPM from /var/db/ntp.drift
Jul 17 13:16:59 tape2 ntpd[487]: running in unprivileged mode disables dynamic interface tracking
Jul 17 13:17:18 tape2 ntpd_initres[279]: parent died before we finished, exiting
Jul 17 13:17:37 tape2 /netbsd: vn_lock failed with 16
Jul 17 13:18:08 tape2 last message repeated 2 times
Jul 17 13:20:12 tape2 last message repeated 11 times
Jul 17 13:20:16 tape2 ntpd[487]: time reset +1.173950 s
Jul 17 13:20:16 tape2 ntpd[487]: kernel time sync status change 2001
Jul 17 15:49:20 tape2 syslogd: restart
So, lots of failed vn_lock - maybe try again without running ntpd...
Cheers,
Patrick