NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/54591: lvm drops volumes on initial start



>Number:         54591
>Category:       bin
>Synopsis:       lvm drops volumes on initial start
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Oct 01 20:40:00 +0000 2019
>Originator:     Martin Neitzel
>Release:        NetBSD 9.99.12 2019-09-21
>Organization:
Gaertner Datensysteme, Marshlabs
>Environment:
System: NetBSD eddie.marshlabs.gaertner.de 9.99.12 NetBSD 9.99.12 (GENERIC) #0: Fri Sep 27 01:08:12 CEST 2019 neitzel%eddie.marshlabs.gaertner.de@localhost:/scratch/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

Upon boot, /etc/rc.d/lvm fails to set up all logical volumes which have
been created.  Random entries in /dev/mapper/ are missing.

As a consequence, the missing filesystems cannot be mounted, and depending
on the missing filesystem, the boot may already abort in single user mode.
Recovery can be... tricky.


>How-To-Repeat:

During NetBSD installation, I defined a disklabel(8) partition /dev/rwd0e to
hold the space for an LVM physical volume:

	neitzel 6 > disklabel wd0
	[...]
	total sectors: 234441648
	[...]
	5 partitions:
	#        size    offset     fstype [fsize bsize cpg/sgs]
	 a:   4194304      2048     4.2BSD      0     0     0  # /
	 b:   2097152   4196416       swap                     # swap
	 c:  41943040      2048     unused      0     0        # NetBSD part.
	 d: 234441648         0     unused      0     0        # whole disk
	 e:  35651520   6293568      vinum                     # LVM PV

I created a simple volume group out of a single physical volume
and for logical volumes:

	# lvm pvcreate /dev/rwd0e
	# lvm vgcreate vg0 /dev/rwd0e

	# lvm lvcreate -L 4g -n src     vg0
	# lvm lvcreate -L 5g -n scratch vg0
	# lvm lvcreate -L 1g -n pkg     vg0
	# lvm lvcreate -L 2g -n local   vg0

I newfs'ed the filesystems on the volumes, prepared mount points and
made /etc/fstab entries as usual.  The "noauto" option no-fsck-"0" are
already for the workaround:

	/dev/mapper/vg0-local   /usr/local      ffs     rw,noauto       0 0
	/dev/mapper/vg0-pkg     /usr/pkglv      ffs     rw,noauto       0 0
	/dev/mapper/vg0-scratch /scratch        ffs     rw,noauto       0 0
	/dev/mapper/vg0-src     /usr/src        ffs     rw,noauto       0 0

While you would typically boot with

	lvm=YES

in /etc/rc.conf, things get easier to repeat/debug/work araound with lvm=NO
and running things manually.  Once in multi-user mode:

pre-flight check:

	/root 5 # modstat | grep -w dm
	/root 6 # dmsetup table
	No devices found

First lvm start, bringing only three out of four volumes on:

	/root 7 # /etc/rc.d/lvm onestart
	Configuring lvm devices.
	 Activated Volume Groups: vg0

	/root 8 # modstat | grep -w dm
	dm                  driver   filesys  a        0   18432 dk_subr

	/root 9 # dmsetup table
	vg0-local: 0 4194304 linear /dev/wd0e 384
	vg0-pkg: 0 2097152 linear /dev/wd0e 4194688
	vg0-src: 0 8388608 linear /dev/wd0e 6291840

	/root 10 # ls -l /dev/mapper
	total 0
	crw-rw----  1 root  operator  194, 0 Aug  7 00:12 control
	crw-r-----  1 root  operator  194, 1 Oct  1 21:34 rvg0-local
	crw-r-----  1 root  operator  194, 2 Oct  1 21:34 rvg0-pkg
	crw-r-----  1 root  operator  194, 3 Oct  1 21:34 rvg0-src
	brw-r-----  1 root  operator  169, 1 Oct  1 21:34 vg0-local
	brw-r-----  1 root  operator  169, 2 Oct  1 21:34 vg0-pkg
	brw-r-----  1 root  operator  169, 3 Oct  1 21:34 vg0-src

This time, the "scratch" volume was missing.  The "3 out of 4"
seems to be fixed, but is random which LV missing.

Revover by restarting the LVM service.  In separate steps:

	/root 11 # /etc/rc.d/lvm onestop
	Unconfiguring lvm devices.
	  Shutting Down logical volume: vg0/local
	  Command failed with status code 5.
	  Shutting Down logical volume: vg0/pkg

Obviously the "stop" runs into inconsistent information, and a bit
of debris is left;  the "dm" kernel module stays loaded:

	/root 12 # modstat | grep -w dm
	dm                  driver   filesys  a        0   18432 dk_subr

	/root 13 # ls -l /dev/mapper
	total 0
	crw-rw----  1 root  operator  194, 0 Aug  7 00:12 control
	crw-r-----  1 root  operator  194, 1 Oct  1 21:34 rvg0-local
	brw-r-----  1 root  operator  169, 1 Oct  1 21:34 vg0-local

	/root 14 # dmsetup table
	vg0-local: 0 4194304 linear /dev/wd0e 384

A second start brings all four volumes online:

	/root 15 # /etc/rc.d/lvm onestart
	Configuring lvm devices.
	 Activated Volume Groups: vg0

	/root 16 # ls -l /dev/mapper
	total 0
	crw-rw----  1 root  operator  194, 0 Aug  7 00:12 control
	crw-r-----  1 root  operator  194, 1 Oct  1 21:34 rvg0-local
	crw-r-----  1 root  operator  194, 4 Oct  1 21:52 rvg0-pkg
	crw-r-----  1 root  operator  194, 6 Oct  1 21:52 rvg0-scratch
	crw-r-----  1 root  operator  194, 5 Oct  1 21:52 rvg0-src
	brw-r-----  1 root  operator  169, 1 Oct  1 21:34 vg0-local
	brw-r-----  1 root  operator  169, 4 Oct  1 21:52 vg0-pkg
	brw-r-----  1 root  operator  169, 6 Oct  1 21:52 vg0-scratch
	brw-r-----  1 root  operator  169, 5 Oct  1 21:52 vg0-src

	/root 17 # dmsetup table
	vg0-local: 0 4194304 linear /dev/wd0e 384
	vg0-pkg: 0 2097152 linear /dev/wd0e 4194688
	vg0-src: 0 8388608 linear /dev/wd0e 6291840
	vg0-scratch: 0 10485760 linear /dev/wd0e 14680448

"mount -a"  and work (almost) as usual.



>Fix:

None known yet.  This may well be a category "kern" instead of "bin" bug.

Hey, I'm happy that I got this far to actually be able to load the
sources and still access them on next boot ;-)

It took me three or four installation attempts get a 9.99.x
-current running at all, with the workarounds as decribed here. In
earlier attempts, I tried to install parts of the base system into
LVs and wents nuts because randonly different parts would be missing
upon reboot.  My first insallation attempts were with GPT partitioning
and a GPT partition as LVM physical volume, then I reverted to MBR
partitioning, then I made sure nothing critical for a multi-user
login (such as /usr/pkg/bin/tcsh) resided on an LV.

Hence "Severity: serious" & "Priority: high".



Home | Main Index | Thread Index | Old Index