NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
Hi again,
the attached patch for /usr/src/sys/dev/dkwedge/dk.c seems to fix the
problem.
No panic occure anymore in reboot.
But someone with detailed knowledge of the mutex order and the reference
management of the wedges-stuff should have a look at it prior
integration into the source tree.
best regards
W. Stukenbrock
Wolfgang Stukenbrock wrote:
The following reply was made to PR kern/46879; it has been noted by GNATS.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
Date: Fri, 31 Aug 2012 12:36:21 +0200
Hi again,
I've had an additional look at it ...
The problem seems to be in src/dev/dkwedges/dk.c
I do not understand the whole semantic of the locks (dk_openlock and
kd_rawlock), but I've recognised the following:
in dk_close() the dk_openlock is entered on the wedge
and if a close of for the parent is needed the dk_rawlock is allocated
for the parent - no dk_openlock on parrent aquired.
The same is done in dkwedge_del.
In both functions vn_close() for the dk_rawvp is called with tzhe
dk_openlock held on the wedge.
Here the dkclose() on the mounted wedge will call raidclose() of the
underlying raid-device.
If doing_shutdown is set, the raid-device gets destroyed and
dkwedge_delall() is called for the raid-device.
dkwedge_delall now copies the information of the first wedge - if any -
into a local buffer and calls dkwedge_del in order to destroy it.
This will enter the the dk_openlock on that wedge again, but we still
hold it from dkclose() before.
This would mean, that the panic is not related to a layerd raid-device,
as I expected before - it will happen for every raiddevice with a wedge
on it.
I've tested this - BSD-label on sd0 and sd1,
a raiddevice (stripe in this case) of sd0 and sd1,
a GPT-label with one wedge on this raiddevice.
If the filesystem on the wedge is mounted when a reboot occures I get
the following panic (and trace) in DDB:
syncing disks... done
unmounting file systems...Mutex error: mutex_vector_enter: locking
against myself
lock address : 0xffff80002f9ffb70
current cpu : 10
current lwp : 0xffff800087b7b000
owner field : 0xffff800087b7b000 wait/spin: 0/0
panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff804b1045 cs 8 rflags 246 cr2
7f7ffd620030 cpl 0 rsp ffff800088062530
Stopped in pid 594.1 (reboot) at netbsd:breakpoint+0x5: leave
db{10}> trace
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x24d
lockdebug_abort() at netbsd:lockdebug_abort+0x42
mutex_vector_enter() at netbsd:mutex_vector_enter+0x208
dkwedge_del() at netbsd:dkwedge_del+0x181
dkwedge_delall() at netbsd:dkwedge_delall+0x65
raidclose() at netbsd:raidclose+0x133
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
vn_close() at netbsd:vn_close+0x51
dkclose() at netbsd:dkclose+0xcb
bdev_close() at netbsd:bdev_close+0x89
spec_close() at netbsd:spec_close+0x231
VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
ffs_unmount() at netbsd:ffs_unmount+0x11e
VFS_UNMOUNT() at netbsd:VFS_UNMOUNT+0x2e
dounmount() at netbsd:dounmount+0xd5
vfs_unmountall() at netbsd:vfs_unmountall+0x55
cpu_reboot() at netbsd:cpu_reboot+0x100
sys_reboot() at netbsd:sys_reboot+0x5f
syscall() at netbsd:syscall+0xa0
db{10}>
Yes - my analyses above seems to be correct.
I would tend to increase the level from serious to critical, because a
wedge on a raidframe-device is not usable!
I'm not sure if it would be the correct sollution to release the
dk_openlock mutex prior calling vn_close() in both functions mentioned
above. I'm not shure if it would be possible to set dk_rawvp to NULL
prior calling vpn_close() on it - may be stored in a temp-variable.
If the answer to thees questions is yes, than that would be the sollution.
Can someone with more knowledge on the mutex order and the semantic of
dk_rawvp have a look on this topic.
Thanks in advance
gnats-admin%NetBSD.org@localhost wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/46879'.
> The individual assigned to look at your
> report is: kern-bug-people.
>
>
>>Category: kern
>>Responsible: kern-bug-people
>>Synopsis: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
>>Arrival-Date: Thu Aug 30 16:10:00 +0000 2012
>>
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock%nagler-company.com@localhost
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler, Dr. Dr. Karl-Kuno Kunze
--- dk.c 2012/08/31 10:57:31 1.1
+++ dk.c 2012/08/31 11:02:46
@@ -432,6 +432,7 @@
dkwedge_del(struct dkwedge_info *dkw)
{
struct dkwedge_softc *sc = NULL;
+ struct vnode *tmp_vp = NULL;
u_int unit;
int bmaj, cmaj, s;
@@ -480,15 +481,15 @@
mutex_enter(&sc->sc_parent->dk_rawlock);
if (sc->sc_parent->dk_rawopens-- == 1) {
KASSERT(sc->sc_parent->dk_rawvp != NULL);
- mutex_exit(&sc->sc_parent->dk_rawlock);
- (void) vn_close(sc->sc_parent->dk_rawvp, FREAD | FWRITE,
- NOCRED);
+ tmp_vp = sc->sc_parent->dk_rawvp;
sc->sc_parent->dk_rawvp = NULL;
- } else
- mutex_exit(&sc->sc_parent->dk_rawlock);
+ }
+ mutex_exit(&sc->sc_parent->dk_rawlock);
sc->sc_dk.dk_openmask = 0;
}
mutex_exit(&sc->sc_dk.dk_openlock);
+ if (tmp_vp != NULL)
+ (void) vn_close(tmp_vp, FREAD | FWRITE, NOCRED);
/* Announce our departure. */
aprint_normal("%s at %s (%s) deleted\n", device_xname(sc->sc_dev),
@@ -964,7 +965,7 @@
dkclose(dev_t dev, int flags, int fmt, struct lwp *l)
{
struct dkwedge_softc *sc = dkwedge_lookup(dev);
- int error = 0;
+ struct vnode *tmp_vp = NULL;
KASSERT(sc->sc_dk.dk_openmask != 0);
@@ -981,17 +982,17 @@
mutex_enter(&sc->sc_parent->dk_rawlock);
if (sc->sc_parent->dk_rawopens-- == 1) {
KASSERT(sc->sc_parent->dk_rawvp != NULL);
- mutex_exit(&sc->sc_parent->dk_rawlock);
- error = vn_close(sc->sc_parent->dk_rawvp,
- FREAD | FWRITE, NOCRED);
+ tmp_vp = sc->sc_parent->dk_rawvp = NULL;
sc->sc_parent->dk_rawvp = NULL;
- } else
- mutex_exit(&sc->sc_parent->dk_rawlock);
+ }
+ mutex_exit(&sc->sc_parent->dk_rawlock);
}
mutex_exit(&sc->sc_dk.dk_openlock);
- return (error);
+ if (tmp_vp != NULL)
+ return vn_close(tmp_vp, FREAD | FWRITE, NOCRED);
+ return 0;
}
/*
Home |
Main Index |
Thread Index |
Old Index