NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
The following reply was made to PR kern/46879; it has been noted by GNATS.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock%nagler-company.com@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock
detected
Date: Fri, 31 Aug 2012 15:04:16 +0200
This is a multi-part message in MIME format.
--------------060607080304090807080703
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Hi again,
the attached patch for /usr/src/sys/dev/dkwedge/dk.c seems to fix the
problem.
No panic occure anymore in reboot.
But someone with detailed knowledge of the mutex order and the reference
management of the wedges-stuff should have a look at it prior
integration into the source tree.
best regards
W. Stukenbrock
Wolfgang Stukenbrock wrote:
> The following reply was made to PR kern/46879; it has been noted by GNATS.
>
> From: Wolfgang Stukenbrock
> <wolfgang.stukenbrock%nagler-company.com@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc:
> Subject: Re: kern/46879: panic in reboot 5_1_STABLE in dkwedge - dead-lock
> detected
> Date: Fri, 31 Aug 2012 12:36:21 +0200
>
> Hi again,
>
>
> I've had an additional look at it ...
>
> The problem seems to be in src/dev/dkwedges/dk.c
>
> I do not understand the whole semantic of the locks (dk_openlock and
> kd_rawlock), but I've recognised the following:
>
> in dk_close() the dk_openlock is entered on the wedge
> and if a close of for the parent is needed the dk_rawlock is allocated
> for the parent - no dk_openlock on parrent aquired.
> The same is done in dkwedge_del.
> In both functions vn_close() for the dk_rawvp is called with tzhe
> dk_openlock held on the wedge.
>
> Here the dkclose() on the mounted wedge will call raidclose() of the
> underlying raid-device.
> If doing_shutdown is set, the raid-device gets destroyed and
> dkwedge_delall() is called for the raid-device.
> dkwedge_delall now copies the information of the first wedge - if any -
> into a local buffer and calls dkwedge_del in order to destroy it.
> This will enter the the dk_openlock on that wedge again, but we still
> hold it from dkclose() before.
>
> This would mean, that the panic is not related to a layerd raid-device,
> as I expected before - it will happen for every raiddevice with a wedge
> on it.
>
> I've tested this - BSD-label on sd0 and sd1,
> a raiddevice (stripe in this case) of sd0 and sd1,
> a GPT-label with one wedge on this raiddevice.
> If the filesystem on the wedge is mounted when a reboot occures I get
> the following panic (and trace) in DDB:
>
> syncing disks... done
> unmounting file systems...Mutex error: mutex_vector_enter: locking
> against myself
>
> lock address : 0xffff80002f9ffb70
> current cpu : 10
> current lwp : 0xffff800087b7b000
> owner field : 0xffff800087b7b000 wait/spin: 0/0
>
> panic: lock error
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 rip ffffffff804b1045 cs 8 rflags 246 cr2
> 7f7ffd620030 cpl 0 rsp ffff800088062530
> Stopped in pid 594.1 (reboot) at netbsd:breakpoint+0x5: leave
> db{10}> trace
> breakpoint() at netbsd:breakpoint+0x5
> panic() at netbsd:panic+0x24d
> lockdebug_abort() at netbsd:lockdebug_abort+0x42
> mutex_vector_enter() at netbsd:mutex_vector_enter+0x208
> dkwedge_del() at netbsd:dkwedge_del+0x181
> dkwedge_delall() at netbsd:dkwedge_delall+0x65
> raidclose() at netbsd:raidclose+0x133
> bdev_close() at netbsd:bdev_close+0x89
> spec_close() at netbsd:spec_close+0x231
> VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
> vn_close() at netbsd:vn_close+0x51
> dkclose() at netbsd:dkclose+0xcb
> bdev_close() at netbsd:bdev_close+0x89
> spec_close() at netbsd:spec_close+0x231
> VOP_CLOSE() at netbsd:VOP_CLOSE+0x62
> ffs_unmount() at netbsd:ffs_unmount+0x11e
> VFS_UNMOUNT() at netbsd:VFS_UNMOUNT+0x2e
> dounmount() at netbsd:dounmount+0xd5
> vfs_unmountall() at netbsd:vfs_unmountall+0x55
> cpu_reboot() at netbsd:cpu_reboot+0x100
> sys_reboot() at netbsd:sys_reboot+0x5f
> syscall() at netbsd:syscall+0xa0
> db{10}>
>
> Yes - my analyses above seems to be correct.
> I would tend to increase the level from serious to critical, because a
> wedge on a raidframe-device is not usable!
>
> I'm not sure if it would be the correct sollution to release the
> dk_openlock mutex prior calling vn_close() in both functions mentioned
> above. I'm not shure if it would be possible to set dk_rawvp to NULL
> prior calling vpn_close() on it - may be stored in a temp-variable.
>
> If the answer to thees questions is yes, than that would be the sollution.
> Can someone with more knowledge on the mutex order and the semantic of
> dk_rawvp have a look on this topic.
> Thanks in advance
>
> gnats-admin%NetBSD.org@localhost wrote:
>
> > Thank you very much for your problem report.
> > It has the internal identification `kern/46879'.
> > The individual assigned to look at your
> > report is: kern-bug-people.
> >
> >
> >>Category: kern
> >>Responsible: kern-bug-people
> >>Synopsis: panic in reboot 5_1_STABLE in dkwedge - dead-lock detected
> >>Arrival-Date: Thu Aug 30 16:10:00 +0000 2012
> >>
> >
> >
>
>
>
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock%nagler-company.com@localhost
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler, Dr. Dr. Karl-Kuno Kunze
--------------060607080304090807080703
Content-Type: text/plain;
name="dk.c-patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="dk.c-patch"
--- dk.c 2012/08/31 10:57:31 1.1
+++ dk.c 2012/08/31 11:02:46
@@ -432,6 +432,7 @@
dkwedge_del(struct dkwedge_info *dkw)
{
struct dkwedge_softc *sc = NULL;
+ struct vnode *tmp_vp = NULL;
u_int unit;
int bmaj, cmaj, s;
@@ -480,15 +481,15 @@
mutex_enter(&sc->sc_parent->dk_rawlock);
if (sc->sc_parent->dk_rawopens-- == 1) {
KASSERT(sc->sc_parent->dk_rawvp != NULL);
- mutex_exit(&sc->sc_parent->dk_rawlock);
- (void) vn_close(sc->sc_parent->dk_rawvp, FREAD | FWRITE,
- NOCRED);
+ tmp_vp = sc->sc_parent->dk_rawvp;
sc->sc_parent->dk_rawvp = NULL;
- } else
- mutex_exit(&sc->sc_parent->dk_rawlock);
+ }
+ mutex_exit(&sc->sc_parent->dk_rawlock);
sc->sc_dk.dk_openmask = 0;
}
mutex_exit(&sc->sc_dk.dk_openlock);
+ if (tmp_vp != NULL)
+ (void) vn_close(tmp_vp, FREAD | FWRITE, NOCRED);
/* Announce our departure. */
aprint_normal("%s at %s (%s) deleted\n", device_xname(sc->sc_dev),
@@ -964,7 +965,7 @@
dkclose(dev_t dev, int flags, int fmt, struct lwp *l)
{
struct dkwedge_softc *sc = dkwedge_lookup(dev);
- int error = 0;
+ struct vnode *tmp_vp = NULL;
KASSERT(sc->sc_dk.dk_openmask != 0);
@@ -981,17 +982,17 @@
mutex_enter(&sc->sc_parent->dk_rawlock);
if (sc->sc_parent->dk_rawopens-- == 1) {
KASSERT(sc->sc_parent->dk_rawvp != NULL);
- mutex_exit(&sc->sc_parent->dk_rawlock);
- error = vn_close(sc->sc_parent->dk_rawvp,
- FREAD | FWRITE, NOCRED);
+ tmp_vp = sc->sc_parent->dk_rawvp = NULL;
sc->sc_parent->dk_rawvp = NULL;
- } else
- mutex_exit(&sc->sc_parent->dk_rawlock);
+ }
+ mutex_exit(&sc->sc_parent->dk_rawlock);
}
mutex_exit(&sc->sc_dk.dk_openlock);
- return (error);
+ if (tmp_vp != NULL)
+ return vn_close(tmp_vp, FREAD | FWRITE, NOCRED);
+ return 0;
}
/*
--------------060607080304090807080703--
Home |
Main Index |
Thread Index |
Old Index