NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/37881: repeatable crash in filesystem when running MP



>Number:         37881
>Category:       kern
>Synopsis:       repeatable crash in filesystem when running MP
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 27 12:40:00 +0000 2008
>Originator:     dieter roelants
>Release:        NetBSD 4.99.50
>Organization:
>Environment:
System: NetBSD simult.amelgem.be 4.99.50 NetBSD 4.99.50 (SIMULT) #38: Sun Jan 
27 10:21:43 CET 2008 
dieter%simult.amelgem.be@localhost:/build/obj.i386.current/sys/arch/i386/compile/SIMULT
 i386
Architecture: i386
Machine: i386
>Description:
        Since upgrading my system from 4.99.35 to 4.99.49 and then
        4.99.50, I can reproducibly crash it by copying some files
        with nautilus from a windows share to my local disk. The
        kernel drops in ddb after a uvm_fault with a backtrace like
        this:

uvm_fault(0xcd8b1004, 0, 1) -> 0xe
knote(cefe5be4,6,0,14c90000,0) at netbsd:knote+0x27
ffs_write(cdccbc04,0,c0483fa0,cefe5b20,10002) at netbsd:ffs_write+0x53d
VOP_WRITE(cefe5b20,cdccbc7c,10,cdb31540,c0579a84) at netbsd:VOP_WRITE+0x80
vn_write(cea71414,cea71440,cdccbc7c,cdb31540,1) at netbsd:vn_write+0xcc
dofilewrite(16,cea71414,ba226000,10000,cea71440) at netbsd:dofilewrite+0x75
sys_write(cef6eba0,cdccbd00,cdccbd28,ba235000,c037cab5) at netbsd:sys_write+0x9c
syscall(cdccbd48,b3,ba2600ab,ba40001f,ba5f001f) at netbsd:syscall+0xb5

        This does not happen if I disable one of the 2 cores with
        cpuctl.

        I never got a core file (ddb was stuck at "syncing...")
        until this morning when I ran a kernel with options LOCKDEBUG.
        The backtrace from the core file follows. Note that LOCKDEBUG
        didn't catch the crash itself but only a paniced the system
        when I typed sync in ddb. The second sync started the core
        dumping.

#0  0xc038e622 in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/src/sys/arch/i386/i386/machdep.c:952
#1  0xc01a38ff in db_sync_cmd (addr=-1068090395, have_addr=false, 
    count=-1068090400, modif=0xcc838c70 "?\214\203??;V??;V?c")
    at /usr/src/sys/ddb/db_command.c:1364
#2  0xc01a3f08 in db_command (last_cmdp=0xc0551edc)
    at /usr/src/sys/ddb/db_command.c:927
#3  0xc01a424f in db_command_loop () at /usr/src/sys/ddb/db_command.c:572
#4  0xc01a7160 in db_trap (type=1, code=0) at /usr/src/sys/ddb/db_trap.c:101
#5  0xc0389710 in kdb_trap (type=1, code=0, regs=0xcc838e9c)
    at /usr/src/sys/arch/i386/i386/db_interface.c:232
#6  0xc0392688 in trap (frame=0xcc838e9c)
    at /usr/src/sys/arch/i386/i386/trap.c:346
#7  0xc010cd68 in calltrap ()
#8  0xc0387dec in breakpoint ()
#9  0xc02fbc75 in panic (fmt=0xc051027b "LOCKDEBUG")
    at /usr/src/sys/kern/subr_prf.c:227
#10 0xc02f51bb in lockdebug_abort1 (ld=0xc0591fe0, lk=0xc059ba40, 
    func=<value optimized out>, msg=0xc050db9f "spinout", dopanic=true)
    at /usr/src/sys/kern/subr_lockdebug.c:758
#11 0xc02f6034 in lockdebug_abort (lock=0xc058eec0, ops=0xc05564a0, 
    func=0xc04822e7 "_kernel_lock", msg=0xc050db9f "spinout")
    at /usr/src/sys/kern/subr_lockdebug.c:802
#12 0xc02d11a5 in _kernel_lock (nlocks=1, l=0xcef6eba0)
    at /usr/src/sys/kern/kern_lock.c:743
#13 0xc037a43c in intr_biglock_wrapper (vp=0xc2e0d9c0)
    at /usr/src/sys/arch/x86/x86/intr.c:640
#14 0xc01087ad in Xintr_ioapic_level10 ()
#15 0xc0100cff in Xspllower ()
#16 0xc032f9af in vfs_shutdown () at /usr/src/sys/kern/vfs_subr.c:1733
#17 0xc038e6ae in cpu_reboot (howto=256, bootstr=0x0)
    at /usr/src/sys/arch/i386/i386/machdep.c:938
#18 0xc01a38ff in db_sync_cmd (addr=-1068090395, have_addr=false, 
    count=-1068090400, modif=0xcdccb868 "\230????;V??;V?c")
    at /usr/src/sys/ddb/db_command.c:1364
#19 0xc01a3f08 in db_command (last_cmdp=0xc0551edc)
    at /usr/src/sys/ddb/db_command.c:927
#20 0xc01a424f in db_command_loop () at /usr/src/sys/ddb/db_command.c:572
#21 0xc01a7160 in db_trap (type=6, code=0) at /usr/src/sys/ddb/db_trap.c:101
#22 0xc0389710 in kdb_trap (type=6, code=0, regs=0xcdccba94)
    at /usr/src/sys/arch/i386/i386/db_interface.c:232
#23 0xc0392688 in trap (frame=0xcdccba94)
    at /usr/src/sys/arch/i386/i386/trap.c:346
#24 0xc010cd68 in calltrap ()
#25 0xc02c6727 in knote (list=0xcefe5be4, hint=6)
    at /usr/src/sys/kern/kern_event.c:1301
#26 0xc025b9ed in ffs_write (v=0xcdccbc04)
    at /usr/src/sys/ufs/ufs/ufs_readwrite.c:507
#27 0xc033cc50 in VOP_WRITE (vp=0xcefe5b20, uio=0xcdccbc7c, ioflag=16, 
    cred=0xcdb31540) at /usr/src/sys/kern/vnode_if.c:517
#28 0xc03395fc in vn_write (fp=0xcea71414, offset=0xcea71440, uio=0xcdccbc7c, 
    cred=0xcdb31540, flags=1) at /usr/src/sys/kern/vfs_vnops.c:478
#29 0xc0302cc5 in dofilewrite (fd=22, fp=0xcea71414, buf=0xba226000, 
    nbyte=65536, offset=0xcea71440, flags=1, retval=0xcdccbd28)
    at /usr/src/sys/kern/sys_generic.c:392
#30 0xc0302e5c in sys_write (l=0xcef6eba0, uap=0xcdccbd00, retval=0xcdccbd28)
    at /usr/src/sys/kern/sys_generic.c:357
#31 0xc0392045 in syscall (frame=0xcdccbd48)
    at /usr/src/sys/arch/i386/i386/syscall.c:114
#32 0xc010053d in syscall1 ()

        Let me know what more information you need. I can also put
        the core and netbsd.gdb somewhere online if necessary.
>How-To-Repeat:
        Copy some (large?) files around with nautilus, possibly
        from an smb share on a system with more than one CPU active?
>Fix:
        I don't know.




Home | Main Index | Thread Index | Old Index