NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
file corruption problems (!) in a xen domU backed by a zvol
I'm not sure which part of my "stack" is the culprit, but here's the
setup where I noticed the problem just now:
The host has a NetBSD 10.1 kernel and a 9.0 userland that's in the
process of being updated to 10.1 - unpacking `base.tar.xz` is where I
first noticed problems. Here's a quick summary of the problem in example
form:
ansible:riz ~/sets> md5 xfont.tar.xz
MD5 (xfont.tar.xz) = f044efd355a3a8fbee8988200aa526d5
ansible:riz ~/sets> md5 xfont.tar.xz
MD5 (xfont.tar.xz) = 07e28b41bb982b2b0a1e0b731599a246
ansible:riz ~/sets> md5 xfont.tar.xz
MD5 (xfont.tar.xz) = b0d14f58745706db4dcf4a30d5b75175
ansible:riz ~/sets>
...for a file which should most definitely NOT be changing.
The host ("ansible") is a xen domU running in PVH mode on a NetBSD-10.1
dom0; the virtual disk is backed by a ZFS zvol. I made a ZFS snapshot
of the zvol just before starting the upgrade. One of the two disks in
the zpool is reporting soft errors (I just noticed these!), but so far
the zpool itself is not showing any errors (i started a scrub about 10m
ago to see if that catches anything):
xenserver1:riz ~> sudo zpool status
pool: tank
state: ONLINE
scan: scrub in progress since Sun Nov 9 17:54:02 2025
54.5G scanned out of 106G at 70.8M/s, 0h12m to go
0 repaired, 51.23% done
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
wedges/zfs-xs1-wd2 ONLINE 0 0 0
wedges/zfs-xs1-wd3 ONLINE 0 0 0
errors: No known data errors
This has me really freaked out, because while I have a backup of this
particular virtual host, having bits arbitrarily change under a VM is
pretty freaky. The VM doesn't show anything unusual in dmesg, but the
dom0 does show some errors which currently seem to be getting corrected:
Nov 9 18:06:50 xenserver1 /netbsd: [ 1250512.6259872] wd3d: channel
reset reading fsbn 4084303096 of 4084303096-4084303223 (wd3 bn
4084303096; cn 4051887 tn 15 sn 55), xfer 220, retry 0
Nov 9 18:06:50 xenserver1 /netbsd: [ 1250512.6259872] wd3d: channel
reset reading fsbn 4084303224 of 4084303224-4084303351 (wd3 bn
4084303224; cn 4051888 tn 1 sn 57), xfer 2b8, retry 0
Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error
(corrected) xfer 58
Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error
(corrected) xfer f0
Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error
(corrected) xfer 220
Nov 9 18:06:51 xenserver1 /netbsd: [ 1250513.1459781] wd3: soft error
(corrected) xfer 2b8
I would love to track this down - anyone have a next step suggestion for
figuring it out?
+j
Home |
Main Index |
Thread Index |
Old Index