NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/56917: raidctl -c configuration fails if NAME=wedge component is missing



>Number:         56917
>Category:       bin
>Synopsis:       raidctl -c configuration fails if NAME=wedge component is missing
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jul 08 11:25:00 +0000 2022
>Originator:     kre%munnari.OZ.AU@localhost
>Release:        NetBSD 9.99.97
>Organization:
>Environment:
System: NetBSD jacaranda.noi.kre.to 9.99.97 NetBSD 9.99.97 (GENERIC) #2: Wed Jun 8 01:46:15 +07 2022 kre%jacaranda.noi.kre.to@localhost:/usr/obj/current/amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	When not using raidframe autoconfiguration, but configuring via a
	raidctl config file (raidctl -c ...) using raidframs on wedges
	(and hence using NAME=wedge-name in the config file, rather then
	/dev/dkN as the latter is more or less meaningless) if the wedge
	is not found, the config fails, and the raid set is not configured
	at all.

	On the other hand, if using autoconfiguration, the missing component
	is simply "failed" and the system works, with the raid in degraded
	mode, just fine.

	Since the whole idea of raid is to keep systems working when some of
	the storage has failed, it seems like it would be a good idea to
	continue with that when using -c and wedges.

	When not using NAME= type config (or perhaps ROOT.x which is less
	likely, though not impossibly, in the same position) the device name
	is simply passed through from the config file to the kernel raidframe,
	which fails to access it (if missing), and so degrades the raid.
	When NAME= is used, getfsspecname() fails, and there is no device
	name to send - the string that is sent to the kernel is meaningless
	to it, and the raidframe config fails entirely.

	Or that's what looks to be happening to me - I have reasons for not
	using raid autoconfig at the minute (I was, and things worked when
	a drive vanished - that has a tendency to happen sometimes on my
	system due to BIOS/NetBSD "issues" - I stopped so I could get earlier
	notification that the drive vanished, before raidframe started
	using a rc.d script that checks for the drive(s) being missing and
	aborts the boot ... but with autconfigured raid, after the raidframe
	was configured with the missing component, requiring a reconstruction
	when I beat the BIOS into submission and the drive returned - so I
	disabled raid autoconf, so raidframs would not start at all until after
	the rc.d script verifies that all the drives are at least present).

>How-To-Repeat:
	Build a raidframe out of wedges (a raid1 from 2 of them - on different
	drives - will do).   Make the config file use NAME=wedge-name to select
	the drives (partitions of the drives) to use.   Configure the raid,
	initialize it, partation it if you want, makefs ... (ie: use the thing).
	Do not turn on raid autoconfig for this raidset (so no root on this
	raid).   Remove one of the drives being used by the raidset (or if that
	is too extreme an action to test this, just relabel one of the relevant
	wedges, so the NAME=wedge-name for one of the wedges doesn't match a
	wedge that is present in the system).   Reboot (with raidframe=YES in
	rc.conf).   Observe that the raid set is not configured at all.
	(If you had raid autoconfig turned on, and did the same thing, the
	raid set would be configured, in degraded mode - the missing component
	marked as failed - but just altering wedge names is no use to test
	that case, the wedge really must be absebt).

>Fix:
	I am not sure this will work (I'm yet to test it) but I think
	this patch might allow the raidframe to configure:

Index: rf_configure.c
===================================================================
RCS file: /cvsroot/src/sbin/raidctl/rf_configure.c,v
retrieving revision 1.36
diff -u -r1.36 rf_configure.c
--- rf_configure.c	14 Jun 2022 08:06:13 -0000	1.36
+++ rf_configure.c	8 Jul 2022 11:00:05 -0000
@@ -278,7 +278,7 @@
 			warnx("Config file error: warning: unable to "
 			    "get device file for disk at col %d: %s",
 			    c, b1);
-			b = buf;
+			b = "absent";
 		}
 
 		strlcpy(cfgPtr->devnames[0][c], b,



Home | Main Index | Thread Index | Old Index