NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected



The following reply was made to PR kern/46879; it has been noted by GNATS.

From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock 
detected
Date: Fri, 31 Aug 2012 12:36:21 +0200

 Hi again,
 
 
 I've had an additional look at it ...
 
 The problem seems to be in src/dev/dkwedges/dk.c
 
 I do not understand the whole semantic of the locks (dk_openlock and 
 kd_rawlock), but I've recognised the following:
 
 in dk_close() the dk_openlock is entered on the wedge
 and if a close of for the parent is needed the dk_rawlock is allocated 
 for the parent - no dk_openlock on parrent aquired.
 The same is done in dkwedge_del.
 In both functions vn_close() for the dk_rawvp is called with tzhe 
 dk_openlock held on the wedge.
 
 Here the dkclose() on the mounted wedge will call raidclose() of the 
 underlying raid-device.
 If doing_shutdown is set, the raid-device gets destroyed and 
 dkwedge_delall() is called for the raid-device.
 dkwedge_delall now copies the information of the first wedge - if any - 
 into a local buffer and calls dkwedge_del in order to destroy it.
 This will enter the the dk_openlock on that wedge again, but we still 
 hold it from dkclose() before.
 
 This would mean, that the panic is not related to a layerd raid-device, 
 as I expected before - it will happen for every raiddevice with a wedge 
 on it.
 
 I've tested this - BSD-label on sd0 and sd1,
 a raiddevice (stripe in this case) of sd0 and sd1,
 a GPT-label with one wedge on this raiddevice.
 If the filesystem on the wedge is mounted when a reboot occures I get 
 the following panic (and trace) in DDB:
 
 syncing disks... done
 unmounting file systems...Mutex error: mutex_vector_enter: locking 
 against myself
 
 lock address : 0xffff80002f9ffb70
 current cpu  :                 10
 current lwp  : 0xffff800087b7b000
 owner field  : 0xffff800087b7b000 wait/spin:                0/0
 
 panic: lock error
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip ffffffff804b1045 cs 8 rflags 246 cr2 
 7f7ffd620030 cpl 0 rsp ffff800088062530
 Stopped in pid 594.1 (reboot) at        netbsd:breakpoint+0x5:  leave
 db{10}> trace
 breakpoint() at netbsd:breakpoint+0x5
 panic() at netbsd:panic+0x24d
 lockdebug_abort() at netbsd:lockdebug_abort+0x42
 mutex_vector_enter() at netbsd:mutex_vector_enter+0x208
 dkwedge_del() at netbsd:dkwedge_del+0x181
 dkwedge_delall() at netbsd:dkwedge_delall+0x65
 raidclose() at netbsd:raidclose+0x133
 bdev_close() at netbsd:bdev_close+0x89
 spec_close() at netbsd:spec_close+0x231
 VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
 vn_close() at netbsd:vn_close+0x51
 dkclose() at netbsd:dkclose+0xcb
 bdev_close() at netbsd:bdev_close+0x89
 spec_close() at netbsd:spec_close+0x231
 VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
 ffs_unmount() at netbsd:ffs_unmount+0x11e
 VFS_UNMOUNT() at netbsd:VFS_UNMOUNT+0x2e
 dounmount() at netbsd:dounmount+0xd5
 vfs_unmountall() at netbsd:vfs_unmountall+0x55
 cpu_reboot() at netbsd:cpu_reboot+0x100
 sys_reboot() at netbsd:sys_reboot+0x5f
 syscall() at netbsd:syscall+0xa0
 db{10}>
 
 Yes - my analyses above seems to be correct.
 I would tend to increase the level from serious to critical, because a 
 wedge on a raidframe-device is not usable!
 
 I'm not sure if it would be the correct sollution to release the 
 dk_openlock mutex prior calling vn_close() in both functions mentioned 
 above. I'm not shure if it would be possible to set dk_rawvp to NULL 
 prior calling vpn_close() on it - may be stored in a temp-variable.
 
 If the answer to thees questions is yes, than that would be the sollution.
 Can someone with more knowledge on the mutex order and the semantic of 
 dk_rawvp have a look on this topic.
 Thanks in advance
 
 gnats-admin%NetBSD.org@localhost wrote:
 
 > Thank you very much for your problem report.
 > It has the internal identification `kern/46879'.
 > The individual assigned to look at your
 > report is: kern-bug-people. 
 > 
 > 
 >>Category:       kern
 >>Responsible:    kern-bug-people
 >>Synopsis:       panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
 >>Arrival-Date:   Thu Aug 30 16:10:00 +0000 2012
 >>
 > 
 > 
 
 
 


Home | Main Index | Thread Index | Old Index