NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
The following reply was made to PR kern/46879; it has been noted by GNATS.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock
detected
Date: Fri, 31 Aug 2012 12:36:21 +0200
Hi again,
I've had an additional look at it ...
The problem seems to be in src/dev/dkwedges/dk.c
I do not understand the whole semantic of the locks (dk_openlock and
kd_rawlock), but I've recognised the following:
in dk_close() the dk_openlock is entered on the wedge
and if a close of for the parent is needed the dk_rawlock is allocated
for the parent - no dk_openlock on parrent aquired.
The same is done in dkwedge_del.
In both functions vn_close() for the dk_rawvp is called with tzhe
dk_openlock held on the wedge.
Here the dkclose() on the mounted wedge will call raidclose() of the
underlying raid-device.
If doing_shutdown is set, the raid-device gets destroyed and
dkwedge_delall() is called for the raid-device.
dkwedge_delall now copies the information of the first wedge - if any -
into a local buffer and calls dkwedge_del in order to destroy it.
This will enter the the dk_openlock on that wedge again, but we still
hold it from dkclose() before.
This would mean, that the panic is not related to a layerd raid-device,
as I expected before - it will happen for every raiddevice with a wedge
on it.
I've tested this - BSD-label on sd0 and sd1,
a raiddevice (stripe in this case) of sd0 and sd1,
a GPT-label with one wedge on this raiddevice.
If the filesystem on the wedge is mounted when a reboot occures I get
the following panic (and trace) in DDB:
syncing disks... done
unmounting file systems...Mutex error: mutex_vector_enter: locking
against myself
lock address : 0xffff80002f9ffb70
current cpu : 10
current lwp : 0xffff800087b7b000
owner field : 0xffff800087b7b000 wait/spin: 0/0
panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff804b1045 cs 8 rflags 246 cr2
7f7ffd620030 cpl 0 rsp ffff800088062530
Stopped in pid 594.1 (reboot) at netbsd:breakpoint+0x5: leave
db{10}> trace
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x24d
lockdebug_abort() at netbsd:lockdebug_abort+0x42
mutex_vector_enter() at netbsd:mutex_vector_enter+0x208
dkwedge_del() at netbsd:dkwedge_del+0x181
dkwedge_delall() at netbsd:dkwedge_delall+0x65
raidclose() at netbsd:raidclose+0x133
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
vn_close() at netbsd:vn_close+0x51
dkclose() at netbsd:dkclose+0xcb
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
ffs_unmount() at netbsd:ffs_unmount+0x11e
VFS_UNMOUNT() at netbsd:VFS_UNMOUNT+0x2e
dounmount() at netbsd:dounmount+0xd5
vfs_unmountall() at netbsd:vfs_unmountall+0x55
cpu_reboot() at netbsd:cpu_reboot+0x100
sys_reboot() at netbsd:sys_reboot+0x5f
syscall() at netbsd:syscall+0xa0
db{10}>
Yes - my analyses above seems to be correct.
I would tend to increase the level from serious to critical, because a
wedge on a raidframe-device is not usable!
I'm not sure if it would be the correct sollution to release the
dk_openlock mutex prior calling vn_close() in both functions mentioned
above. I'm not shure if it would be possible to set dk_rawvp to NULL
prior calling vpn_close() on it - may be stored in a temp-variable.
If the answer to thees questions is yes, than that would be the sollution.
Can someone with more knowledge on the mutex order and the semantic of
dk_rawvp have a look on this topic.
Thanks in advance
gnats-admin%NetBSD.org@localhost wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/46879'.
> The individual assigned to look at your
> report is: kern-bug-people.
>
>
>>Category: kern
>>Responsible: kern-bug-people
>>Synopsis: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
>>Arrival-Date: Thu Aug 30 16:10:00 +0000 2012
>>
>
>
Home |
Main Index |
Thread Index |
Old Index