NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/54541: kernel panic using "zfs diff"



The following reply was made to PR kern/54541; it has been noted by GNATS.

From: Brad Spencer <brad%anduin.eldar.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost,
        joernc%posteo.de@localhost
Subject: Re: kern/54541: kernel panic using "zfs diff"
Date: Fri, 11 Oct 2019 19:47:33 -0400

 Patrick Welche <prlw1%cam.ac.uk@localhost> writes:
 
 > The following reply was made to PR kern/54541; it has been noted by GNATS.
 >
 > From: Patrick Welche <prlw1%cam.ac.uk@localhost>
 > To: gnats-bugs%netbsd.org@localhost
 > Cc: 
 > Subject: Re: kern/54541: kernel panic using "zfs diff"
 > Date: Fri, 11 Oct 2019 17:01:34 +0100
 >
 >  On Wed, Oct 09, 2019 at 05:05:01PM +0000, Christos Zoulas wrote:
 >  >  Something seems to not understand that this is a hijacked fd which it seems
 >  >  to be: 133 (128 + 5)...
 >  
 >  /dev/zfs is hijacked:
 >  export RUMPHIJACK=blanket=/dev/zfs:/dk:/storage,sysctl=yes,modctl=yes
 >  
 >  rumpns_fd_getfile receives the 133 rather than 5 so complains.
 >  So I seem to be seeing a rump issue rather than a zfs issue?
 >  
 
 I would say that there is a pretty good chance that rump does not quite
 handle ZFS correctly.  I won't speculate as to why.
 
 Back to the original kernel dump... This is pretty simple for me to
 reproduce this as well... this is a snip I get:
 
 [ 472375.2036056] panic: kernel diagnostic assertion "fdm != NULL" failed: file "/usr/src/sys/kern/vfs_trans.c", line 166 mount 0x0 invalid
 .
 .
 .
 [ 472375.2036056] vn_rdwr() at netbsd:vn_rdwr+0x136
 [ 472375.2036056] write_record.part.1() at zfs:write_record.part.1+0x54
 [ 472375.2036056] diff_cb() at zfs:diff_cb+0x236
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x1b6
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x52b
 [ 472375.2036056] traverse_dnode() at zfs:traverse_dnode+0xda
 [ 472375.2036056] traverse_visitbp() at zfs:traverse_visitbp+0x8ab
 [ 472375.2036056] traverse_impl() at zfs:traverse_impl+0x16c
 [ 472375.2036056] traverse_dataset_resume() at zfs:traverse_dataset_resume+0x44
 [ 472375.2036056] dmu_diff() at zfs:dmu_diff+0x14c
 
 The write_record call is in
 src/external/cddl/osnet/dist/uts/common/fs/zfs/dmu_diff.c and it is
 pretty small.  It might be interesting to know what the arguments to the
 single vn_rdwr call are.
 
 I won't have time right now to find this out for myself, however....
 
 Recursion is involved in all of this, that is what the traverse_visitbp
 stuff is all about that is mentioned in the panic messages, and I wonder
 if there is a missing or mishandled terminator condition.  The panic
 itself, in my case, is tripped by a DIAGNOSTIC assert check in a VOP
 function.  It is a little confusing, but diff_cb is a call back (of some
 sort) that appears to be set up by a call to traverse_dataset which gets
 translated in the panic as traverse_dataset_resume (I think).
 
 I can only run this on a DOMU, so no kernel dumps, but I suspect that if
 one could get a clean kernel dump somewhere else it would all become
 clear what is going on.
 
 
 
 
 
 -- 
 Brad Spencer - brad%anduin.eldar.org@localhost - KC8VKS - http://anduin.eldar.org
 


Home | Main Index | Thread Index | Old Index