tech-kern: some problems with "old" RAIDframe arrays on netbsd-1-6

Subject: some problems with "old" RAIDframe arrays on netbsd-1-6
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 10/19/2003 12:42:48
I'm having some problems with a pair of RAID-5 arrays that were created
when the system was running 1.5W (-current as of about 2001/06/24).
I've since upgraded the system to 1.6.1_STABLE (netbsd-1-6 as of about
2003/09/06).

The first (and most critical) problem is that I'm unable to add a spare
to one of the arrays (in order to replace a failed component):

	# raidctl -v -a /dev/sd6a raid0           
	raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: Invalid argument

After/as the command above runs the kernel prints the following on the
console:

	Spare disk /dev/sd6a (512 blocks) is too small to serve as a spare (need 8890688 blocks)

I.e. RAIDframe isn't seeing the disk's new label properly.  In fact it
is as follows:

	# disklabel sd6    
	# /dev/rsd6d:
	type: SCSI
	disk: VIKING 4.5 WSE
	label: raid0-spare
	flags:
	bytes/sector: 512
	sectors/track: 181
	tracks/cylinder: 8
	sectors/cylinder: 1448
	cylinders: 6144
	total sectors: 8896512
	rpm: 7200
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 
	
	8 partitions:
	#        size    offset     fstype  [fsize bsize cpg/sgs]
	 a:   8890697        63       RAID                      # (Cyl.    0*- 6140*)
	 c:   8890697        63     unused      0     0         # (Cyl.    0*- 6140*)
	 d:   8896512         0     unused      0     0         # (Cyl.    0 - 6143)

Note I've made the `a' partition's size exactly match that of the other
volumes in the array (they'er all slightly different kinds of disks).

(I don't have another disk big enough to try adding a spare to the
second RAID-5 array to see if it's something specific to this array....)

After fiddling with the label (trying a size of 8890688 before noticing
that the first value in the kernel message was a very unlikely and oddly
"even" number), I tried again only to be surprised by a new error:

	# raidctl -v -a /dev/sd6a raid0           
	raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: Device busy

I.e. the disk vnode continues to have a v_usecount of 1 even after
raidctl exits.  Somewhere a VOP_UNLOCK() or vput() call must be missing.



I also noticed that the "Autoconfig" value isn't copied to new disks
that have been added as spares in the past.  My original "raid0" wasn't
autoconfiguring properly and I found that one component didn't have
"Autconfig: Yes" any more.  Rerunning "raidctl -A yes raid0" fixes it
but I'd suggest the addition of a new component as a spare should
inherit this value from the other components.  Should I send-pr this?
(I may try looking for a fix....)



Also, as an aside, I never made any of my component partitions have an
fstype of RAID before and yet autoconfig still worked in 1.5W.  It was
as if the check for (p_fstype != FS_RAID) wasn't happening before (even
though I see the code right there in rf_netbsdkintf.c in my old source
tree).  This may be another hint about the disk labels not being read
properly, though it still doesn't make any sense.


I'm going to reboot again now to make sure raid0 is really auto-
configuring itself at boot and also to check that my new RAID-1 mirror
for the root disk comes up properly....

If anyone has any clues about the problem I'm having adding a spare to
the RAID-5, please let me know!

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>