Re: install/44982: gpt partition on raid on gpt not bootable

To: martin%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, Hauke Fath <hf%spg.tu-darmstadt.de@localhost>
Subject: Re: install/44982: gpt partition on raid on gpt not bootable
From: Brian Buhrow <buhrow%nfbcal.org@localhost>
Date: Tue, 22 Oct 2019 08:35:02 +0000 (UTC)

The following reply was made to PR install/44982; it has been noted by GNATS.

From: Brian Buhrow <buhrow%nfbcal.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: buhrow%nfbcal.org@localhost
Subject: Re: install/44982: gpt partition on raid on gpt not bootable
Date: Tue, 22 Oct 2019 01:34:51 -0700

 	hello.  In trying to install a new xen  server with NetBSD-9 as the
 dom0, using raid1, I ran into this bug.  However, the bug is more subtle
 than it first appears.  Specifically, I can get a system to boot off of a
 raid1 partition if I do the following:

 1.  Create gpt partitions on wd0 and wd1.
 The partition tables on each disk contain two partitions, one for the
 root filesystem and one for swap.  Both are of type raidframe.

 2.  Create 2 raid1 disks, each using one partition from each disk from step
 1.  In my case, raid0 consist of /dev/dk0 and /dev/dk2, which correspond to
 the first gpt partition of each disk from step 1.  The second raid consists
 of components: /dev/dk1 and /dev/dk3, which correspond with the second
 gpt partition from each of the disks from step 1.

 3.  Set raid0 as autoconfiguring and as the root device:
 raidctl -A root raid0

 4.  Set raid1 as autoconfiguring:
 raidctl -A yes raid1

 5.  Put a gpt partition table on raid1:
 gpt create /dev/rraid1d

 6.  Add a swap partition to the raid1 device and give it a name:
 gpt add -t swap -l gptswap /dev/rraid1d

 7.  Add a disklabel to the raid0 device and make raid0a an ufs filesystem.

 8.  Use gpt biosboot to set the first gpt partitions as bootable on each
 disk:
 gpt biosboot -i 1 /dev/rwd0d
 gpt biosboot -i 1 /dev/rwd1d

 9.  Run installboot on the dk devices that comprise the partitions you just
 made bootable with gpt biosboot in step 8:
 cd /usr/mdec
 installboot -v -o console=<device> -o speed=whatever /dev/dk0 bootxx-ffs<version>
 installboot -v -o console=<device> -o speed=whatever /dev/dk2 bootxx-ffs<version>

 10.  Newfs /dev/rraid0a

 11. Populate /dev/raid0a and copy /usr/mdec/boot to /boot of that
 filesystem.

 Then, add a /boot.cfg that looks like:

 menu=Boot normally:rndseed /var/db/entropy-file;boot hd0a:netbsd
 menu=Boot single user:rndseed /var/db/entropy-file;boot -s
 menu=Drop to boot prompt:prompt
 default=1
 timeout=5
 clear=1

 And an /etc/fstab that looks like:

 /dev/raid0a / ffs rw,log 1 1
 name=gptswap none swap sw,dp 0 0

 This configuration almost works, but there is still a bug.

 If you try to boot this setup using the /boot in the NetBSD-9 distribution,
 you can boot it manually, i.e. at the boot prompt you can type:
 >boot hd0:netbsd
 and the system will come up  fully multi-user, with root and swap mounted
 and functioning.  
 However, the boot.cfg file is incorrectly parsed.  Specifically, it appears
 that boot2.c  gratuitously adds "NAME=" to the boot command on line 1 of
 the boot.cfg file, which causes the boot loader to fail to find the kernel.
 I think the problem is here:

 /*	$NetBSD: /sys/arch/i386/stand/i386/boot/boot2.c,v 1.70.8.1 2019/09/13 07:00:13 martin Exp $	*/

     171      *fsname = "ufs";
     172      if (default_part_name == NULL) {
     173           *devname = default_devname;
     174      } else {
     175           snprintf(savedevname, sizeof(savedevname),
     176               "NAME=%s", default_part_name);
     177           *devname = savedevname;

 	The "NAME=" is unconditionally added to the filename spec when parsed
 from the config file.  Or, at least, that is what appears to be happening.
 	So, this isn't exactly the same problem as first reported in this bug,
 but it's similar enough that I thought I'd put the report here rather than
 creating a new bug.  If I fix the boot2.c problem, I then can boot from
 raid1 partitions on arbitrarily large disks  as long as I use a disklabel
 atop a set of gpt partitions.  In my case, I'm using a raid1 disk of 250GB
 on top of a set of 4TB disks.  The swap raid partition, which is gpt, is 1G
 in size.
 	It's also worth noting that I don't think the problem is with the gpt
 boot code, but rather the bootxx code.  The bootxx code, which is needed to
 set the console and find the secondary boot file, tries to load the
 filesystem from 4 places on the disk:

 1.  At sector 1 of the wedge/slice from which it was loaded.

 2.  64 sectors (RF_PROTECTED_SECTORS) from the read in step 1 of the wedge/slice
 from which it was loaded.

 3.  From the beginning of the "a" partition of a disklabel if a disklabel
 exists in the wedge/slice from which it was loaded.

 4.  64 sectors (RF_PROTECTED_SECTORS) from the beginning of the "a" partition
 of a disklabel if a disklabel exists in the wedge/slice from which it was
 loaded.

 	So,  if we teach boot1.c to check for the beginning of a filesystem
 RF_PRoTECTED_SECTORS after the beginning of the first gpt wedge in the
 slice/wedge from which boot1 was loaded, I think that will solve the
 original bug.  However, the second bug, the boot.cfg parsing bug, also
 needs to be fixed in order to fully address this issue.

 -thanks
 -Brian

Prev by Date: port-i386/54636: bootstrap build issue on Xinuos OpenServer 5
Next by Date: bin/54637: missing PX field in iso images?
Previous by Thread: port-i386/54636: bootstrap build issue on Xinuos OpenServer 5
Next by Thread: bin/54637: missing PX field in iso images?
Indexes:

Home | Main Index | Thread Index | Old Index