NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected



>Number:         46879
>Category:       kern
>Synopsis:       panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 30 16:10:00 +0000 2012
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 5.1.STABLE
>Organization:
Dr. Nagler & Company GmbH
>Environment:
        
        
System: NetBSD test-s0 4.0 NetBSD 4.0 (NSW-WS) #0: Tue Aug 17 17:28:09 CEST 
2010 wgstuken@test-s0:/usr/src/sys/arch/amd64/compile/NSW-WS amd64
Architecture: x86_64
Machine: amd64
>Description:
        affected system: amd64 (Xeon 5xxx CPU -> 12 Cpu's found by system)

        The problem does not occure, if the filesystem is unmounted by hand.
        It only happens (till now) if the filesystem is still mounted when the 
system reboots.
        The panic is reproducable.

        Filesystem setup - one partition with entire disk all the time:
        (remark: dk0, dk1, dk2 and dk3 are other wedges on other disks)
        sd0 - BSD-label - sd0e (type RAID)
        sd1 - BSD-label - sd1e (type RAID)
        sd2 - BSD-label - sd2e (type RAID)
        sd3 - BSD-label - sd3e (type RAID)
        raid4 stripe (0) of sd0e and sd1e - gpt-label - dk4 (type raidframe)
        raid5 stripe (0) of sd2e and sd3e - gpt-label - dk5 (type raidframe)
        raid3 stripe (0) of dk4 and dk5 - gpt-label - dk6 (type ffs)

        When /dev/dk6 is mounted when the system reboots the panic below 
happens.

        remark: the panic will not occure with layered raidframe devices build 
with BSD-labels only.
                This runs stable since 4.0 on our server - currently with 
5.1-release (and the patch
                from PR39784 to support autoconfiguration).
        remark: This is a test-setup to perform some tests with gpt-labels, 
raidframe and large disks.
                So don't blame about the less usefull raid0 on raid0 setup here.
                The main goal is to setup a stripe of mirrors for data security 
with raidframe.
                I'm just working on the reintegration of my 
autoconfigure-Raidframe-Code
                (see PR39784 - still not integrated) (see PR45179 for this 
general topic too) in 5_1_STABLE.
                (It looks good now - setup seems to work and I will update 
PR39784 in the next days.)
                At this point I've stumbled over the problem reported here.


        The following output is found after executing the "reboot" command as 
root:

System shutdown time has arrived

About to run shutdown hooks...
Stopping cron.
Stopping inetd.
Waiting for PIDS: 451.
Removing block-type swap devices
swapctl: removing /dev/raid0b as swap device
Thu Aug 30 17:28:35 CEST 2012

Done running shutdown hooks.
Aug 30 17:28:40 wst-test syslogd: Exiting on signal 15
syncing disks... done
unmounting file systems...Mutex error: mutex_vector_enter: locking against 
myself

lock address : 0xffff80002fd8a570
current cpu  :                  6
current lwp  : 0xffff8000881c0000
owner field  : 0xffff8000881c0000 wait/spin:                0/0

panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff804b1045 cs 8 rflags 246 cr2  7f7ffd985290 cpl 0 
rsp ffff8000882ec280
Stopped in pid 563.1 (reboot) at        netbsd:breakpoint+0x5:  leave
db{6}>

        The following traceback is found:
db{6}> trace
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x24d
lockdebug_abort() at netbsd:lockdebug_abort+0x42
mutex_vector_enter() at netbsd:mutex_vector_enter+0x208
dkwedge_del() at netbsd:dkwedge_del+0x181
dkwedge_delall() at netbsd:dkwedge_delall+0x65
raidclose() at netbsd:raidclose+0x133
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
vn_close() at netbsd:vn_close+0x51
dkclose() at netbsd:dkclose+0xcb
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
rf_close_component() at netbsd:rf_close_component+0x59
rf_UnconfigureVnodes() at netbsd:rf_UnconfigureVnodes+0x42
rf_Shutdown() at netbsd:rf_Shutdown+0x174
raidclose() at netbsd:raidclose+0xf7
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
vn_close() at netbsd:vn_close+0x51
dkclose() at netbsd:dkclose+0xcb
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
ffs_unmount() at netbsd:ffs_unmount+0x11e
VFS_UNMOUNT() at netbsd:VFS_UNMOUNT+0x2e
dounmount() at netbsd:dounmount+0xd5
vfs_unmountall() at netbsd:vfs_unmountall+0x55
cpu_reboot() at netbsd:cpu_reboot+0x100
sys_reboot() at netbsd:sys_reboot+0x5f
syscall() at netbsd:syscall+0xa0
db{6}> 

>How-To-Repeat:
        Setup an layered raidframe device with wedges as described, mount it 
and reboot the system.
>Fix:
        Not know at the moment - sorry.
        It looks like a problem related to the recursive call for dkclose(), 
raidclose() ...
        If anybody can give me some assistence in DDB-debugging I can deliver 
more information.
        remark: the root is on a raid (1) device - and the system fails to dump 
to it ...

        I think it should be easy to reproduce this setup on any system with a 
free disk available
        for testing.  I beleave it will also happen if all "sd"-partitions will 
be on the same disk.

>Unformatted:
        
        


Home | Main Index | Thread Index | Old Index