tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Working on ufs_rename patches for NetBSD-5



        hello David.  I'm still working on porting your ufs_rename patches to
NetBSD-5.x, and I've made even more progress.  Now, I can run without
filesystem corruption of any kind without logging, or with WAPBL logging 
enabled.
Softdep still isn't working, but I have a feeling I'm close to resolving
that issue.
However, I'm intermittently running into the same  issue you reported in
July.  Namely, When runing your dirconc test with WAPBL logging enabled, I
pretty reliably get:
panic: lockdebug_barrier: holding 1 shared locks (curlwp = 0xcbc6fcc0)

Note that I've patched the lockdebug code to show me the relevant curlwp at
the time of the panic.
        In loking at my crash dump with gdb, I find I have a question.  
This panic comes from subr_lockdebug.c, line 664, or there abouts, under
the NetBSD-5.x sources.  It says, in part, 


        if (l->l_shlocks != 0) {
                panic("lockdebug_barrier: holding %d shared locks (curlwp = 
0x%x)",
                    l->l_shlocks, (unsigned int)l);
        }

        This is after it's checked for an actual lock structure, and just
before it declares success.  The arguments to the function in this case are
all 0, meaning there's no actual lock to be checked, or, at least, I don't
think there is.  In fact, the call I'm losing things on is from:
sys/sys/userret.h, line 104.
That line reads:

        LOCKDEBUG_BARRIER(NULL, 0);

        I notice in earlier parts of subr_lockdebug.c, that l_shlocks gets set
at the same time as ld->ld_shares.
Does anyone know why, on this particular check, we do the check for  a
shared lock in the struct  lwp without matching it up with an actual lock?

In the backtrace below, you'll notice the call to lockdebug_barrier is
passed 0's for arguments across the board, meaning all the code above the
lines that panic are rendered moot, unless I'm not understanding something
here in a big way, which is highly probabl.  Also, everywhere else we
manipulate l_shlocks, we do it while holding splhigh.  We do this check,
ans subsequent panic, without holding splhigh.  Is it possible something's
changing under us while we're still checking things out?  I don't get the
panic every time, and sometimes it takes a while to  hit after I start the
tests, but when it panics, the traces always look the same.
Should this check really not fire if there isn't a matching ld structure to
go with the lwp in question?
        Any light anyone could shed on these questions would be greatly
appreciated.
-thanks
-Brian


(gdb) target kvm netbsd.22.core
#0  0xc050f8e2 in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c:924
924     /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c: No such file or 
directory.
        in /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c
(gdb)kvm proc 0xcbc6fcc0
#0  0xc0448bdf in mi_switch (l=0xcbc6fcc0)
    at /usr/local/netbsd/src/sys/kern/kern_synch.c:765
765     /usr/local/netbsd/src/sys/kern/kern_synch.c: No such file or directory.
        in /usr/local/netbsd/src/sys/kern/kern_synch.c
(gdb) bt
#0  0xc0448bdf in mi_switch (l=0xcbc6fcc0)
    at /usr/local/netbsd/src/sys/kern/kern_synch.c:765
#1  0xc044594b in sleepq_block (timo=0, catch=false)
    at /usr/local/netbsd/src/sys/kern/kern_sleepq.c:269
#2  0xc0424f5c in cv_wait (cv=0xc122fcb0, mtx=0xca4fca10)
    at /usr/local/netbsd/src/sys/kern/kern_condvar.c:201
#3  0xc0492f52 in biowait (bp=0xc122fc18)
    at /usr/local/netbsd/src/sys/kern/vfs_bio.c:1515
#4  0xc04a665c in wapbl_doio (data=0xc135be00, len=512, devvp=0xca4fca10, 
    pbn=1301769, flags=0) at /usr/local/netbsd/src/sys/kern/vfs_wapbl.c:745
---Type <return> to continue, or q <return> to quit---
#5  0xc04a773f in wapbl_circ_write (wl=0xc1294700, data=0xc135be00, len=512, 
    offp=0xcbc838c8) at /usr/local/netbsd/src/sys/kern/vfs_wapbl.c:800
#6  0xc04a7e59 in wapbl_flush (wl=0xc1294700, waitfor=0)
    at /usr/local/netbsd/src/sys/kern/vfs_wapbl.c:2000
#7  0xc03aea08 in ffs_sync (mp=0xcb38c644, waitfor=2, cred=0xcb926900)
    at /usr/local/netbsd/src/sys/ufs/ffs/ffs_vfsops.c:1823
#8  0xc049b44c in VFS_SYNC (mp=0xcb38c644, a=2, b=0xcb926900)
    at /usr/local/netbsd/src/sys/kern/vfs_subr.c:3064
#9  0xc04a2d9c in sys_sync (l=0xcbc6fcc0, v=0x0, retval=0x0)
    at /usr/local/netbsd/src/sys/kern/vfs_syscalls.c:825
---Type <return> to continue, or q <return> to quit---
#10 0xc049c12e in vfs_shutdown ()
    at /usr/local/netbsd/src/sys/kern/vfs_subr.c:2383
#11 0xc050f95b in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/local/netbsd/src/sys/arch/i386/i386/machdep.c:910
#12 0xc015f1e9 in db_sync_cmd (addr=-876070228, have_addr=false, count=-1, 
    modif=0xcbc83a18 "\003Ú­Àc")
    at /usr/local/netbsd/src/sys/ddb/db_command.c:1304
#13 0xc015f9a8 in db_command (last_cmdp=0xc0a4917c)
    at /usr/local/netbsd/src/sys/ddb/db_command.c:926
#14 0xc015fc22 in db_command_loop ()
---Type <return> to continue, or q <return> to quit---
    at /usr/local/netbsd/src/sys/ddb/db_command.c:583
#15 0xc0162b30 in db_trap (type=1, code=0)
    at /usr/local/netbsd/src/sys/ddb/db_trap.c:101
#16 0xc050a89b in kdb_trap (type=1, code=0, regs=0xcbc83c3c)
    at /usr/local/netbsd/src/sys/arch/i386/i386/db_interface.c:229
#17 0xc05125c3 in trap (frame=0xcbc83c3c)
    at /usr/local/netbsd/src/sys/arch/i386/i386/trap.c:351
#18 0xc010cb60 in calltrap ()
#19 0xc0508f7c in breakpoint ()
#20 0xc0463ac0 in panic (
---Type <return> to continue, or q <return> to quit---
    fmt=0xc09db7cc "lockdebug_barrier: holding %d shared locks (curlwp = 
0x%x)") at /usr/local/netbsd/src/sys/kern/subr_prf.c:250
#21 0xc045dba5 in lockdebug_barrier (spinlock=0x0, slplocks=0)
    at /usr/local/netbsd/src/sys/kern/subr_lockdebug.c:664
#22 0xc05120c7 in syscall (frame=0xcbc83d48)
    at /usr/local/netbsd/src/sys/sys/userret.h:104
#23 0xc0100505 in syscall1 ()
(gdb) 
(gdb) 
(gdb) up
#1  0xc044594b in sleepq_block (timo=0, catch=false)
    at /usr/local/netbsd/src/sys/kern/kern_sleepq.c:269
269     /usr/local/netbsd/src/sys/kern/kern_sleepq.c: No such file or directory.
        in /usr/local/netbsd/src/sys/kern/kern_sleepq.c

[ . . .]

(gdb) 
#20 0xc0463ac0 in panic (
    fmt=0xc09db7cc "lockdebug_barrier: holding %d shared locks (curlwp = 
0x%x)") at /usr/local/netbsd/src/sys/kern/subr_prf.c:250
250     /usr/local/netbsd/src/sys/kern/subr_prf.c: No such file or directory.
        in /usr/local/netbsd/src/sys/kern/subr_prf.c
(gdb) 
#21 0xc045dba5 in lockdebug_barrier (spinlock=0x0, slplocks=0)
    at /usr/local/netbsd/src/sys/kern/subr_lockdebug.c:664
664     /usr/local/netbsd/src/sys/kern/subr_lockdebug.c: No such file or 
directory.
        in /usr/local/netbsd/src/sys/kern/subr_lockdebug.c
(gdb) print l
$1 = (struct lwp *) 0xcbc6fcc0
(gdb) print l->l_shlocks
$2 = 1
(gdb) print ld
$3 = (volatile struct lockdebug *) 0x0
(gdb) quit



Home | Main Index | Thread Index | Old Index