tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

I think I've found why Xen domUs can't mount some file-backed disk images! (vnd(4) hides labels!)



I think I've found why Xen domUs can't mount some file-backed disk images!

This realization must have come to my unconscious as I was sleeping,
since just as I awoke I realised what must be happening.

The clue I needed was from back in the early March discussion with
Michael van Elst about "problems with GPT (and maybe dkctl wedges) on
LVM volumes" where he sais "The LVM volume is not a disk", and then my
realization that a vnd(4) interprets the file AS A DISK, and so this
relies on the "whole drive" partition really being the whole drive, and
maybe it's not.

I don't know if this is just a "new" problem or not -- but it is
certainly a real problem.  So far I've only tested on machines running a
relatively recent -current kernel (from approx 2021-03-10 sources).

For example, when I try to export the FreeBSD mini-memstick.img file to
a domU (with the following "disk" spec) that boots a recent FreeBSD HPV
kernel I get:

type = "pvh"
name = "fbsd-test"
memory = 2000
maxmem = 8000
vcpus = 4
vif = [ 'bridge=bridge0' ]
kernel = "/images/freebsd-12.2-kernel"
cmdline = 'vfs.root.mountfrom=ufs:ufs/FreeBSD_Install,vfs.root.mountfrom.options=ro,boot_verbose=1'
disk = [
	'format=raw, vdev=hda, access=ro, target=/images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img',
]

xbd0: 386MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0
xbd0: attaching as ada0
xbd0: features: flush
xbd0: synchronize cache commands enabled.
GEOM: new disk ada0
xn0: backend features: feature-sg
Trying to mount root from ufs:ufs/FreeBSD_Install [ro]...
GEOM_PART: partition 2 has end offset beyond last LBA: 791120 > 790527
GEOM_PART: integrity check failed (ada0, MBR)
mountroot: waiting for device ufs/FreeBSD_Install...
Mounting from ufs:ufs/FreeBSD_Install failed with error 19.

Loader variables:
  vfs.root.mountfrom=ufs:ufs/FreeBSD_Install
  vfs.root.mountfrom.options=ro

Manual root filesystem specification:
  <fstype>:<device> [options]
      Mount <device> using filesystem <fstype>
      and with the specified (optional) option list.

    eg. ufs:/dev/da0s1a
        zfs:zroot/ROOT/default
        cd9660:/dev/cd0 ro
          (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

  ?               List valid disk boot devices
  .               Yield 1 second (for background tasks)
  <empty line>    Abort manual input

mountroot> random: unblocking device.
arc4random: no preloaded entropy cache
?

List of GEOM managed disk devices:
  ada0

mountroot>


However if I copy that exact same mini-memstic.img file to an LVM volume
and then export that (here as "sdb" in the following), I get success:

type = "pvh"
name = "fbsd-test"
memory = 2000
maxmem = 8000
vcpus = 4
vif = [ 'bridge=bridge0' ]
kernel = "/images/freebsd-12.2-kernel"
cmdline = 'vfs.root.mountfrom=ufs:ufs/FreeBSD_Install,vfs.root.mountfrom.options=ro,boot_verbose=1'
disk = [
	# vg0-fbsd--test.1 has mini-memstick.img copied to it
        'format=raw, vdev=sda, access=ro, target=/dev/mapper/vg1-fbsd--test.0',
	# this is a blank LVM LV
        'format=raw, vdev=sdb, access=rw, target=/dev/mapper/vg0-fbsd--test.1',
	# this is a 4gb file of zeros:
        'format=raw, vdev=sdc, access=rw, target=/images/fbsd-test.2',
]


xbd0: 40960MB <Virtual Block Device> at device/vbd/2048 on xenbusb_front0
xbd0: attaching as da0
xbd0: features: flush
arc4random: no preloaded entropy cache
xn0: bpf attached
xn0: Ethernet address: 00:16:3e:2d:b0:d2
xbd0: synchronize cache commands enabled.
GEOM: new disk da0
xenbusb_back0: <Xen Backend Devices> on xenstore0
xenballoon0: <Xen Balloon Device> on xenstore0
xbd1: 30720MB <Virtual Block Device> at device/vbd/2064 on xenbusb_front0
xbd1: attaching as da1
xbd1: features: flush
xbd1: synchronize cache commands enabled.
xbd2: 4096MB <Virtual Block Device> at device/vbd/2080 on xenbusb_front0
xbd2: attaching as da2
xbd2: features: flush
xbd2: synchronize cache commands enabled.
xn0: backend features: feature-sg
Trying to mount root from ufs:ufs/FreeBSD_Install [ro]...
GEOM: new disk da1
GEOM: new disk da2
xen_et0: providing initial system time
start_init: trying /sbin/init
arc4random: no preloaded entropy cache
Starting file system checks:
/dev/ufs/FreeBSD_Install: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/FreeBSD_Install: clean, 5744 free (128 frags, 702 blocks, 0.1% fragmentation)



So, the problem appears to be that the /dev/vndXd partition isn't
making the whole file visible fully transparently.

The way Xen(tools) makes a file available to the domU is limited by the
fact that xbdback(4) can only interface with block devices, and as such
Xen(tools) uses a script to interpose a vnd(4) device over a file and
make it look like a block device.  Now this script tells xbdback(4) to
open the "d" partition, which in theory should present the whole file as
a raw block device.  However it is not doing so, critically for the
first few blocks.  /dev/vnd0d is all zeros for the first 8192 bytes, but
the original image is not:



# fdisk -F /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
Disk: /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
NetBSD disklabel disk geometry:
cylinders: 49, heads: 255, sectors/track: 63 (16065 sectors/cylinder)
total sectors: 791121, bytes/sector: 512

BIOS disk geometry:
cylinders: 49, heads: 255, sectors/track: 63 (16065 sectors/cylinder)
total sectors: 791121

Partitions aligned to 16065 sector boundaries, offset 63

Partition table:
0: EFI system partition (sysid 239)
    start 1, size 1600 (1 MB, Cyls 0/0/2-0/25/26)
1: FreeBSD or 386BSD or old NetBSD (sysid 165)
    start 1601, size 789520 (386 MB, Cyls 0/25/27-49/62/30), Active
2: <UNUSED>
3: <UNUSED>
First active partition: 1
Drive serial number: 2425393296 (0x90909090)

# fdisk vnd0
fdisk: primary partition table invalid, no magic in sector 0
fdisk: Cannot determine the number of heads
Disk: /dev/rvnd0d
NetBSD disklabel disk geometry:
cylinders: 4096, heads: 64, sectors/track: 32 (2048 sectors/cylinder)
total sectors: 8388608, bytes/sector: 512

BIOS disk geometry:
cylinders: 522, heads: 255, sectors/track: 63 (16065 sectors/cylinder)
total sectors: 8388608

Partitions aligned to 16065 sector boundaries, offset 63

Partition table:
0: <UNUSED>
1: <UNUSED>
2: <UNUSED>
3: <UNUSED>
Bootselector disabled.
No active partition.
Drive serial number: 0 (0x00000000)

# disklabel vnd0
# /dev/rvnd0d:
type: vnd
disk: vnd
label: fictitious
flags:
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 64
sectors/cylinder: 2048
cylinders: 4096
total sectors: 8388608
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

4 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:   8388608         0     4.2BSD      0     0     0  # (Cyl.      0 -   4095)
 d:   8388608         0     unused      0     0        # (Cyl.      0 -   4095)
disklabel: boot block size 0
disklabel: super block size 0

# cmp /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img /dev/rvnd0d
/images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img /dev/rvnd0d differ: char 1, line 1
# cmp /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img /dev/mapper/rvg1-fbsd--test.0
cmp: EOF on /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img

(which just means the LVM LV is bigger than the IMG, but they both were
the same through the whole length of the IMG)

# dd if=/images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img count=2 msgfmt=quiet | od -c
0000000  374   1 300 216 300 216 330 216 320 274  \0   | 276 032   | 277
0000020  032 006 271 346 001 363 244 351  \0 212   1 366 273 276 007 261
0000040  004   8   /   t  \b 177   u 205 366   u   q 211 336 200 303 020
0000060  342 357 205 366   u 002 315 030 200 372 200   r 013 212   6   u
0000100  004 200 306 200   8 362   r 002 212 024 211 347 212   t 001 213
0000120    L 002 273  \0   | 366 006 275 007 200   t   -   Q   S 273 252
0000140    U 264   A 315 023   r     201 373   U 252   u 032 366 301 001
0000160    t 025   [   f   j  \0   f 377   t  \b 006   S   j 001   j 020
0000200  211 346 270  \0   B 353 005   [   Y 270 001 002 315 023 211 374
0000220    r 017 201 277 376 001   U 252   u  \f 377 343 276 271 006 353
0000240  021 276 321 006 353  \f 276 360 006 353 007 273 007  \0 264 016
0000260  315 020 254 204 300   u 364 353 376   I   n   v   a   l   i   d
0000300        p   a   r   t   i   t   i   o   n       t   a   b   l   e
0000320   \0   E   r   r   o   r       l   o   a   d   i   n   g       o
0000340    p   e   r   a   t   i   n   g       s   y   s   t   e   m  \0
0000360    M   i   s   s   i   n   g       o   p   e   r   a   t   i   n
0000400    g       s   y   s   t   e   m  \0 220 220 220 220 220 220 220
0000420  220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220
*
0000660  220 220 220 220 220 220 220 220 220 220 220 220 220 200  \0 377
0000700  377 377 357 377 377 377 001  \0  \0  \0   @ 006  \0  \0 200 377
0000720  377 377 245 377 377 377   A 006  \0  \0 020  \f  \f  \0  \0  \0
0000740   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000760   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   U 252
0001000  353   < 220   B   S   D   4   .   4          \0 002 001 001  \0
0001020  002  \0 002   @ 006 360 005  \0   ?  \0 001  \0  \0  \0  \0  \0
0001040   \0  \0  \0  \0  \0  \0   ) 356 021   A 275   E   F   I   S   Y
0001060    S                       F   A   T   1   2             372   1
0001100  300 216 320 274  \0   | 373 216 330 350  \0  \0   ^ 203 306 031
0001120  273 007  \0 374 254 204 300   t 006 264 016 315 020 353 365   0
0001140  344 315 026 315 031  \r  \n   N   o   n   -   s   y   s   t   e
0001160    m       d   i   s   k  \r  \n   P   r   e   s   s       a   n
0001200    y       k   e   y       t   o       r   e   b   o   o   t  \r
0001220   \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0001240   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0001760   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   U 252
0002000


# dd if=/dev/rvnd0d count=17 msgfmt=quiet| od -c
0000000   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0020000   \0  \0  \0  \0  \0  \0  \0  \0  \b  \0  \0  \0 020  \0  \0  \0
0020020  030  \0  \0  \0 230 005  \0  \0  \0  \0  \0  \0 377 377 377 377
0020040  367 360   p   `  \0  \0      \0 007 200 037  \0 027  \0  \0  \0
0020060   \0   @  \0  \0  \0  \b  \0  \0  \b  \0  \0  \0 005  \0  \0  \0
0020100   \0  \0  \0  \0   <  \0  \0  \0  \0 300 377 377  \0 370 377 377
0020120  016  \0  \0  \0 013  \0  \0  \0 004  \0  \0  \0  \0 020  \0  \0
0020140  003  \0  \0  \0 002  \0  \0  \0  \0  \b  \0  \0  \0  \0  \0  \0
0020160   \0  \0  \0  \0  \0 020  \0  \0 200  \0  \0  \0 004  \0  \0  \0
0020200   \0  \0  \0  \0 300 220 005  \0 001  \0  \0  \0  \0  \0  \0  \0
0020220  367 360   p   `   _   `   A   q 230 005  \0  \0  \0  \b  \0  \0
0020240   \0   @  \0  \0  \0  \0  \0  \0 300 220 005  \0 300 220 005  \0
0020260  027  \0  \0  \0 001  \0  \0  \0  \0   X  \0  \0   0   d 001  \0
0020300  001  \0  \0  \0 377 357 003  \0 375 347 007  \0 016  \0  \0  \0
0020320   \0 001  \0 200  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0020340   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0021000


In fact the vnd0d device seems to give garbage forever -- it seems to
have been completely confused by trying to access a real disk image!


As a side note unfortunately even though access to this LVM-backed
mini-memstick.img file now seems OK enough to get the install booted and
a shell running, access to other FreeBSD xbd(4) devices is still not
working from FreeBSD (i.e. a fresh newfs'ed FS appears corrupt to an
immediate fsck, without mounting, and even fsck of the mounted root in
this IMG fails enormously).

# df
Filesystem               512-blocks   Used  Avail Capacity  Mounted on
/dev/ufs/FreeBSD_Install     782968 737016 -16680   102%    /
devfs                             2      2      0   100%    /dev
tmpfs                         65536    232  65304     0%    /var
tmpfs                         40960      8  40952     0%    /tmp
# fsck /dev/ufs/FreeBSD_Install
** /dev/ufs/FreeBSD_Install

SAVE DATA TO FIND ALTERNATE SUPERBLOCKS? [yn] n


ADD CYLINDER GROUP CHECK-HASH PROTECTION? [yn] n

** Last Mounted on
** Root file system
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=28
SALVAGE? [yn] n

PARTIALLY TRUNCATED INODE I=112
SALVAGE? [yn] ^Cda0: disk error cmd=write 8145-8152 status: fffffffe

#
***** FILE SYSTEM MARKED DIRTY *****

#


--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpgyKGaX9xxU.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index