Subject: some problems with "old" RAIDframe arrays on netbsd-1-6
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 10/19/2003 12:42:48
I'm having some problems with a pair of RAID-5 arrays that were created
when the system was running 1.5W (-current as of about 2001/06/24).
I've since upgraded the system to 1.6.1_STABLE (netbsd-1-6 as of about
2003/09/06).
The first (and most critical) problem is that I'm unable to add a spare
to one of the arrays (in order to replace a failed component):
# raidctl -v -a /dev/sd6a raid0
raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: Invalid argument
After/as the command above runs the kernel prints the following on the
console:
Spare disk /dev/sd6a (512 blocks) is too small to serve as a spare (need 8890688 blocks)
I.e. RAIDframe isn't seeing the disk's new label properly. In fact it
is as follows:
# disklabel sd6
# /dev/rsd6d:
type: SCSI
disk: VIKING 4.5 WSE
label: raid0-spare
flags:
bytes/sector: 512
sectors/track: 181
tracks/cylinder: 8
sectors/cylinder: 1448
cylinders: 6144
total sectors: 8896512
rpm: 7200
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
8 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 8890697 63 RAID # (Cyl. 0*- 6140*)
c: 8890697 63 unused 0 0 # (Cyl. 0*- 6140*)
d: 8896512 0 unused 0 0 # (Cyl. 0 - 6143)
Note I've made the `a' partition's size exactly match that of the other
volumes in the array (they'er all slightly different kinds of disks).
(I don't have another disk big enough to try adding a spare to the
second RAID-5 array to see if it's something specific to this array....)
After fiddling with the label (trying a size of 8890688 before noticing
that the first value in the kernel message was a very unlikely and oddly
"even" number), I tried again only to be surprised by a new error:
# raidctl -v -a /dev/sd6a raid0
raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: Device busy
I.e. the disk vnode continues to have a v_usecount of 1 even after
raidctl exits. Somewhere a VOP_UNLOCK() or vput() call must be missing.
I also noticed that the "Autoconfig" value isn't copied to new disks
that have been added as spares in the past. My original "raid0" wasn't
autoconfiguring properly and I found that one component didn't have
"Autconfig: Yes" any more. Rerunning "raidctl -A yes raid0" fixes it
but I'd suggest the addition of a new component as a spare should
inherit this value from the other components. Should I send-pr this?
(I may try looking for a fix....)
Also, as an aside, I never made any of my component partitions have an
fstype of RAID before and yet autoconfig still worked in 1.5W. It was
as if the check for (p_fstype != FS_RAID) wasn't happening before (even
though I see the code right there in rf_netbsdkintf.c in my old source
tree). This may be another hint about the disk labels not being read
properly, though it still doesn't make any sense.
I'm going to reboot again now to make sure raid0 is really auto-
configuring itself at boot and also to check that my new RAID-1 mirror
for the root disk comes up properly....
If anyone has any clues about the problem I'm having adding a spare to
the RAID-5, please let me know!
--
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com>