file corruption problems (!) in a xen domU backed by a zvol

To: netbsd-users%NetBSD.org@localhost
Subject: file corruption problems (!) in a xen domU backed by a zvol
From: Jeff Rizzo <riz%tastylime.net@localhost>
Date: Sun, 9 Nov 2025 18:10:50 -0800

I'm not sure which part of my "stack" is the culprit, but here's thesetup where I noticed the problem just now:

The host has a NetBSD 10.1 kernel and a 9.0 userland that's in theprocess of being updated to 10.1 - unpacking `base.tar.xz` is where Ifirst noticed problems. Here's a quick summary of the problem in exampleform:


ansible:riz  ~/sets> md5 xfont.tar.xz
MD5 (xfont.tar.xz) = f044efd355a3a8fbee8988200aa526d5
ansible:riz  ~/sets> md5 xfont.tar.xz
MD5 (xfont.tar.xz) = 07e28b41bb982b2b0a1e0b731599a246
ansible:riz  ~/sets> md5 xfont.tar.xz
MD5 (xfont.tar.xz) = b0d14f58745706db4dcf4a30d5b75175
ansible:riz  ~/sets>


...for a file which should most definitely NOT be changing.

The host ("ansible") is a xen domU running in PVH mode on a NetBSD-10.1dom0; the virtual disk is backed by a ZFS zvol. I made a ZFS snapshotof the zvol just before starting the upgrade. One of the two disks inthe zpool is reporting soft errors (I just noticed these!), but so farthe zpool itself is not showing any errors (i started a scrub about 10mago to see if that catches anything):


xenserver1:riz  ~> sudo zpool status
  pool: tank
 state: ONLINE
  scan: scrub in progress since Sun Nov  9 17:54:02 2025
        54.5G scanned out of 106G at 70.8M/s, 0h12m to go
        0 repaired, 51.23% done
config:

    NAME                  STATE     READ WRITE CKSUM
    tank                  ONLINE       0     0     0
      wedges/zfs-xs1-wd2  ONLINE       0     0     0
      wedges/zfs-xs1-wd3  ONLINE       0     0     0

errors: No known data errors

This has me really freaked out, because while I have a backup of thisparticular virtual host, having bits arbitrarily change under a VM ispretty freaky. The VM doesn't show anything unusual in dmesg, but thedom0 does show some errors which currently seem to be getting corrected:

Nov 9 18:06:50 xenserver1 /netbsd: [ 1250512.6259872] wd3d: channelreset reading fsbn 4084303096 of 4084303096-4084303223 (wd3 bn4084303096; cn 4051887 tn 15 sn 55), xfer 220, retry 0Nov 9 18:06:50 xenserver1 /netbsd: [ 1250512.6259872] wd3d: channelreset reading fsbn 4084303224 of 4084303224-4084303351 (wd3 bn4084303224; cn 4051888 tn 1 sn 57), xfer 2b8, retry 0Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error(corrected) xfer 58Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error(corrected) xfer f0Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error(corrected) xfer 220Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error(corrected) xfer 2b8

I would love to track this down - anyone have a next step suggestion forfiguring it out?

+j

Follow-Ups:
- Re: file corruption problems (!) in a xen domU backed by a zvol
  - From: Greg Troxel

Prev by Date: Re: access LVM logical volume as disk?
Next by Date: Re: file corruption problems (!) in a xen domU backed by a zvol
Previous by Thread: access LVM logical volume as disk?
Next by Thread: Re: file corruption problems (!) in a xen domU backed by a zvol
Indexes:

Home | Main Index | Thread Index | Old Index