Subject: Re: kernel stack traceback
To: Bill Studenmund <wrstuden@nas.nasa.gov>
From: Paul H. Anderson <pha@pdq.com>
List: tech-kern
Date: 03/19/1999 14:25:46
On Thu, 18 Mar 1999, Bill Studenmund wrote:

> On Thu, 18 Mar 1999, Paul H. Anderson wrote:
> 
> > This is in relation to my earlier message.  A process is probably failing
> > to free a vnode lock under certain circumstances.
> 
> The second panic (#2) is because the vnode was left locked after the first
> panic (#11). ;-)
> 
> > (gdb) where
> > #0  0xfffffc0000457638 in dumpsys ()
> >     at ../../../../arch/alpha/alpha/machdep.c:1509
> > #1  0xfffffc0000457280 in cpu_reboot (howto=260, bootstr=0x0)
> >     at ../../../../arch/alpha/alpha/machdep.c:1328
> > #2  0xfffffc00003374dc in panic (
> >     fmt=0xfffffc000046ee2c "lockmgr: locking against myself")
> >     at ../../../../kern/subr_prf.c:212
> > #3  0xfffffc0000326b54 in lockmgr (lkp=0xfffffc0022cd6ef8, flags=0, 
> >     interlkp=0x0) at ../../../../kern/kern_lock.c:320
> > #4  0xfffffc00003f116c in ufs_lock (v=0x0)
> >     at ../../../../ufs/ufs/ufs_vnops.c:1588
> > #5  0xfffffc0000364484 in vn_lock (vp=0xfffffc001a09d990, flags=65554)
> >     at ../../../../sys/vnode_if.h:724
> > #6  0xfffffc000035b790 in vget (vp=0xfffffc001a09d990, flags=0)
> >     at ../../../../kern/vfs_subr.c:870
> > #7  0xfffffc00003e80ac in ffs_sync (mp=0xfffffe00002b6e00, waitfor=2, 
> >     cred=0xfffffe0000261b00, p=0xfffffc00004e5d30)
> >     at ../../../../ufs/ffs/ffs_vfsops.c:812
> > #8  0xfffffc000035e9d0 in sys_sync (p=0xfffffc00004e5d30, v=0x0,
> > retval=0x0)
> >     at ../../../../kern/vfs_syscalls.c:528
> > #9  0xfffffc000035d7a0 in vfs_shutdown () at
> > #10 0xfffffc0000457228 in cpu_reboot (howto=256, bootstr=0x0)
> >     at ../../../../arch/alpha/alpha/machdep.c:1311
> > #11 0xfffffc00003374dc in panic (
> >     fmt=0xfffffc000046ee2c "lockmgr: locking against myself")
> >     at ../../../../kern/subr_prf.c:212
> > #12 0xfffffc0000326b54 in lockmgr (lkp=0xfffffc0022cd6ef8, flags=0, 
> >     interlkp=0x0) at ../../../../kern/kern_lock.c:320
> > #13 0xfffffc00003f116c in ufs_lock (v=0x0)
> >     at ../../../../ufs/ufs/ufs_vnops.c:1588
> > #14 0xfffffc0000364484 in vn_lock (vp=0xfffffc001a09d990, flags=65538)
> >     at ../../../../sys/vnode_if.h:724
> > #15 0xfffffc000035b790 in vget (vp=0xfffffc001a09d990, flags=0)
> >     at ../../../../kern/vfs_subr.c:870
> > #16 0xfffffc00003e9c94 in ufs_ihashget (dev=1794, inum=11270)
> >     at ../../../../ufs/ufs/ufs_ihash.c:111
> > #17 0xfffffc00003e8260 in ffs_vget (mp=0xfffffe00002b6e00, ino=11270, 
> >     vpp=0xfffffe00375f3bd0) at ../../../../ufs/ffs/ffs_vfsops.c:870
> > #18 0xfffffc00003e0e10 in ffs_valloc (v=0x0)
> >     at ../../../../ufs/ffs/ffs_alloc.c:581
> > #19 0xfffffc00003f1c3c in ufs_makeinode (mode=33204,
> > dvp=0xfffffc001a09d990, 
> >     vpp=0xfffffe00375f3d80, cnp=0xfffffe00375f3da8)
> >     at ../../../../sys/vnode_if.h:902
> > #20 0xfffffc00003ee1a0 in ufs_create (v=0x0)
> >     at ../../../../ufs/ufs/ufs_vnops.c:113
> > #21 0xfffffc000036336c in vn_open (ndp=0xfffffe00375f3d60, fmode=1550, 
> >     cmode=436) at ../../../../sys/vnode_if.h:96
> 
> So the fmode is O_RDWR | O_NONBLOCK | O_APPEND | O_CREAT | O_TRUNC.
>
> Why do you have O_APPEND & O_TRUNC set? If you have O_TRUNC set, wouldn't
> all writes naturally append?

My test script, wedge.sh, uses 8 simultaneous copies of tar extracting
sources.tar.gz into different directories.  I threw in a mmap test as
well.  I have no idea why tar would open a file that way.

> 
> > #22 0xfffffc000035f3f0 in sys_open (p=0xfffffc0022e18028, 
> >     v=0xfffffe00375f3e88, retval=0xfffffe00375f3ed8)
> >     at ../../../../kern/vfs_syscalls.c:858
> > #23 0xfffffc000046128c in syscall (code=5, framep=0xfffffe00375f3ef8)
> >     at ../../../../arch/alpha/alpha/trap.c:636
> > #24 0xfffffc0000300540 in XentSys ()
> >     at ../../../../arch/alpha/alpha/locore.s:495
> > warning: Hit heuristic-fence-post without finding
> > warning: enclosing function for address 0x12004532c
> > 
> > There is a chance the line numbers are incorrect, as I got this panic from
> > a stripped kernel, turned around and built a non-stripped kernel from the
> > same config file, and ran gdb on the resulting netbsd.gdb.  Again, this is
> > 19990122 sources with no changes.
> > 
> > I'll start looking at source, but also see if I can reproduce this crash.
> 
> Let me know if you can reproduce this crash. I'm not sure if there should
> be any problem about the stripped kernel as typically we run the stripped
> ones.

I can relatively easily reproduce the crash.  This morning, I did newfs on
my ccd filesystem (-b 32768 -f 8192) and ran a stress test that caused a
panic after about an hour.  The locks below come from this second panic
(not the one that gave the above traceback).  The second panic looks like
the first (same message), but it is still at the ddb prompt on the serial
console, and the alpha doesn't do kernel stack tracebacks yet, so I can't
confirm that it is exactly the same - tell me what you want to get, and
I'll do it.

> I'm confused about the arguments to the ufs_create call. v is a pointer.
> Is v == 0x0 really valid?

Dunno - I'm not much of a ufs guy...

> Does ddb (the in-kernel debugger) work on alpha? If so, call
> printlockedvnodes() before trying to reboot. I think it should tell you
> where/how the node got locked.

Here's the locks:

charger# sh wedge.sh
... some minor output deleted
successful open, length=0, mapdata=0
panic: lockmgr: locking against myself
Stopped in tar at       Debugger+0x4:   ret     zero,(ra)
db> call printlockedvnodes
Locked vnodes
type VREG, usecount 1, writecount 1, refcount 0,
        tag VT_UFS, ino 486422, on dev 7, 2 lock type inode: EXCL (count 1) by pid 424
type VREG, usecount 1, writecount 0, refcount 1,
        tag VT_UFS, ino 479816, on dev 7, 2 lock type inode: EXCL (count 1) by pid 427
type VREG, usecount 1, writecount 0, refcount 1,
        tag VT_UFS, ino 478287, on dev 7, 2 lock type inode: EXCL (count 1) by pid 433
type VREG, usecount 1, writecount 1, refcount 0,
        tag VT_UFS, ino 488458, on dev 7, 2 lock type inode: EXCL (count 1) by pid 415
type VREG, usecount 1, writecount 0, refcount 1,
        tag VT_UFS, ino 483375, on dev 7, 2 lock type inode: EXCL (count 1) by pid 436
type VDIR, usecount 1, writecount 0, refcount 1,
        tag VT_UFS, ino 487437, on dev 7, 2 lock type inode: EXCL (count 1) by pid 430
type VDIR, usecount 1, writecount 0, refcount 1,
        tag VT_UFS, ino 486933, on dev 7, 2 lock type inode: EXCL (count 1) by pid 421
type VDIR, usecount 2, writecount 0, refcount 1,
        tag VT_UFS, ino 483851, on dev 7, 2 lock type inode: EXCL (count 1) by pid 418
type VREG, usecount 2, writecount 0, refcount 2584,
        tag VT_UFS, ino 1543, on dev 7, 2 lock type inode: EXCL (count 1) by pid 438
0
db>

It looks like I can panic the machine in about an hour or so, so we should
be able to get lots of information about this panic over time.

Paul


+------------------------------------------------------+
| Paul Anderson           Public Data Queries, Inc.    |
| pha@pdq.com             734-213-4964(W) 994-3734(H)  |
+------------------------------------------------------+