NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54724: ZFS/Zvol corrupts kernel memory when running with xen on dom0



>Number:         54724
>Category:       kern
>Synopsis:       ZFS/Zvol corrupts kernel memory when running with xen on dom0
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 27 23:50:00 +0000 2019
>Originator:     Brian Buhrow
>Release:        NetBSD 9.0_BETA
>Organization:
	NFB of California
>Environment:
	
	
System: NetBSD via.net 9.0_BETA NetBSD 9.0_BETA (VIANET_DOM0) #7: Fri Nov 22 13:36:22 PST 2019  buhrow%viadev64.via.net@localhost:/usr/local/netbsd/obj-64/sys/arch/amd64/compile/VIANET_DOM0 amd64
Architecture: amd64
Machine: amd64
>Description:
	
	When using zfs zvols as domu disk storage, zfs corrupts kernel memory
on dom0 so that communications through the xbd drivers are corrupted
between the dom0 and the domu's.  The domu's panic with the error: Panic:
biodon2 already
Since we're talking about kernel memory corruption, the dom0 can also
crash, though it's not guaranteed.  I created some patches to try and
instrument more graceful error recovery on the part of the xbdback_xenbus
driver, which helps with the domu panic, but doesn't address the underlying
problem.
	I've verified that this problem does not occur if a vnd(4) based file
is used as the backingstore for the domu disk on the same system.
	It would be nice if zvols could be used for backingstore on domu
disks, since it would ease the management of multiple domu's on a system,
not to mention reducing the time it takes to create a domu disk image on
the dom0 system.
>How-To-Repeat:
	
1.  Build a stock dom0 kernel with NetBSD-9.0BETA.
Install it, using the xen-debug.gz kernel.  (problem shows up using either
Xen-4.8 or 4.11).

2.  Create a domu with some version of NetBSD on it that uses the xbd disk
for its data.  Create a zfs zvol  with the size set to the size of the
domu's disk you want to use.
I'll include a sample domu config file, below.

3.  Install the NetBSD-9 source tree on the domu.

4.  Run a build.sh release  on the domu, dumping the output into a log
file.

5.  Crash!

Here's the output I see from the kernel,as well as the xen hypervisor
(debug version)

(XEN) grant_table.c:591:d0v0 Bad flags (0) or dom (0). (expected dom 0)
[ 3133.1700793] xen_shm_map: op[0].status = -1 (2)

<The previous line indicates that gref[0] is corrupt and that there were 2
gref entries in the array at the time of the failure, see
sys/arch/xen/x86/xen_shm_machdep.c:180
/*      $NetBSD: xen_shm_machdep.c,v 1.13 2019/01/27 02:08:39 pgoyette Exp $      */

[ 3133.1700793] xbdback_map_shm: xen_shm error -1 xbd IO domain 1: error -1
[ 3133.1700793] xbdback_io domain 1: end request 30 error=-1
[ 3133.1700793] xbdback_io domain 1: end request 1 error=-1

	At this point, the domu is nonfunctional and, very likely, so is the
dom0.


<Here is the sample domu config file>

#  -*- mode: python; -*-
#============================================================================
# Python configuration setup for 'xl create'.
# This script sets the parameters used when a domain is created using 'xm create'.
# You use a separate script for each domain you want to create, or 
# you can set the parameters for the domain on the xm command line.
#============================================================================

#----------------------------------------------------------------------------
# Kernel image file.
kernel = "/var/xen/vianet/viadev64_h/netbsd"

# Initial memory allocation (in megabytes) for the new domain.
memory = 8192

# Number of Virtual CPUS to use, default is 1
vcpus = 1


# A name for your domain. All domains must have different names.
name = "viadev64_h_via_net"

#----------------------------------------------------------------------------
# network configuration.
# The mac address is optional, it will use a random one if not specified.
# By default we create a bridged configuration; when a vif is created
# the script /usr/pkg/etc/xen/scripts/vif-bridge is called to connect
# the bridge to the designated bridge (the bridge should already be up)
vif = [  'bridge=bridge0' ]

#it's possible to use a different script when the vif is created;
# for example to use a routed setup instead of bridged:
# vif = [ 'mac=00:16:3e:00:00:11, ip=10.0.0.1 netmask 255.255.255.0, script=vif-ip' ]

#----------------------------------------------------------------------------
# Define the disk devices you want the domain to have access to, and
# what you want them accessible as.
# Each disk entry is of the form phy:UNAME,DEV,MODE
# where UNAME is the device, DEV is the device name the domain will see,
# and MODE is r for read-only, w for read-write.
# For NetBSD guest DEV doesn't matter, so we can just use increasing numbers
# here. For linux guests you have to use a linux device name (e.g. hda1)
# or the corresponding device number (e.g 0x301 for hda1)

disk = [ '/dev/zvol/dsk/xendisks/viadev64_h,raw,0x1,w' ]

#----------------------------------------------------------------------------
# Boot parameters (e.g. -s, -a, ...)
extra = ""

#============================================================================

#Reboot after shutdowns
on_poweroff = "restart"

>Fix:

	Don't know how to correct the problem at this time.  I tried compiling
the dom0 kernel with options KASAN, but that doesn't seem to be supported
on xen kernels, even if they're on amd64 hardware.
	
I was hoping to catch zfs in the act of committing its corruption.

>Unformatted:
 	
 	


Home | Main Index | Thread Index | Old Index