NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/54541: kernel panic using "zfs diff"
The following reply was made to PR kern/54541; it has been noted by GNATS.
From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
joernc%posteo.de@localhost
Subject: Re: kern/54541: kernel panic using "zfs diff"
Date: Fri, 11 Oct 2019 19:47:33 -0400
Patrick Welche <prlw1%cam.ac.uk@localhost> writes:
> The following reply was made to PR kern/54541; it has been noted by GNATS.
>
> From: Patrick Welche <prlw1%cam.ac.uk@localhost>
> To: gnats-bugs%netbsd.org@localhost
> Cc:
> Subject: Re: kern/54541: kernel panic using "zfs diff"
> Date: Fri, 11 Oct 2019 17:01:34 +0100
>
> On Wed, Oct 09, 2019 at 05:05:01PM +0000, Christos Zoulas wrote:
> > Something seems to not understand that this is a hijacked fd which it seems
> > to be: 133 (128 + 5)...
>
> /dev/zfs is hijacked:
> export RUMPHIJACK=blanket=/dev/zfs:/dk:/storage,sysctl=yes,modctl=yes
>
> rumpns_fd_getfile receives the 133 rather than 5 so complains.
> So I seem to be seeing a rump issue rather than a zfs issue?
>
I would say that there is a pretty good chance that rump does not quite
handle ZFS correctly. I won't speculate as to why.
Back to the original kernel dump... This is pretty simple for me to
reproduce this as well... this is a snip I get:
[ 472375.2036056] panic: kernel diagnostic assertion "fdm != NULL" failed: file "/usr/src/sys/kern/vfs_trans.c", line 166 mount 0x0 invalid
.
.
.
[ 472375.2036056] vn_rdwr() at netbsd:vn_rdwr+0x136
[ 472375.2036056] write_record.part.1() at zfs:write_record.part.1+0x54
[ 472375.2036056] diff_cb() at zfs:diff_cb+0x236
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x1b6
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
[ 472375.2036056] traverse_dnode() at zfs:traverse_dnode+0xda
[ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x8ab
[ 472375.2036056] traverse_impl() at zfs:traverse_impl+0x16c
[ 472375.2036056] traverse_dataset_resume() at zfs:traverse_dataset_resume+0x44
[ 472375.2036056] dmu_diff() at zfs:dmu_diff+0x14c
The write_record call is in
src/external/cddl/osnet/dist/uts/common/fs/zfs/dmu_diff.c and it is
pretty small. It might be interesting to know what the arguments to the
single vn_rdwr call are.
I won't have time right now to find this out for myself, however....
Recursion is involved in all of this, that is what the traverse_visitbp
stuff is all about that is mentioned in the panic messages, and I wonder
if there is a missing or mishandled terminator condition. The panic
itself, in my case, is tripped by a DIAGNOSTIC assert check in a VOP
function. It is a little confusing, but diff_cb is a call back (of some
sort) that appears to be set up by a call to traverse_dataset which gets
translated in the panic as traverse_dataset_resume (I think).
I can only run this on a DOMU, so no kernel dumps, but I suspect that if
one could get a clean kernel dump somewhere else it would all become
clear what is going on.
--
Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
Home |
Main Index |
Thread Index |
Old Index