Subject: Two RAIDFrame Problems -- Reconfigure + Hot Spare
To: None <netbsd-users@netbsd.org>
From: Rob Ginn <rob@sun701.nawcad.navy.mil>
List: netbsd-users
Date: 05/21/2002 16:17:17
Hi,
I'm having two problems with the RAIDFrame system
under NetBSD. The first deals with changing the
configuration of a RAID device, the second with the
operation of the hot spare. I've spent a week on it
and any help would be much appreciated. I'm running
NetBSD 1.5.2 on an i386 platform.
Problem #1
----------
When I first configured 2 RAID devices. One had
3 components and no spares and one had 3 components
and 1 hot spare. I set them to autoconfigure (1 root,
the other not). I installed to the root one (which
had no spare) and all was well. Then I decided to
reconfigure the root one to include a hot spare
and now I'm almost bald :) The following details
the approaches I've taken to "kill" the configuration,
some pretty severe. Nothing works, even completely
zeroing the raid components. (BTW, I've written these
steps after the fact, but I'm 99.9% sure they are what
I did and in the same order):
0. -- my initial setup --
vi raid0.conf # create the configuration file
# NB: NO hot spare, the other
# components are sd0f, sd1f,
# and sd2f
vi raid1.conf # create the configuration file
# NB: I've GOT a hot spare in
# this one (sd0e, sd1e, sd2e
# are active and sd3e is hot spare)
raidctl -C raid0.conf raid0 # configure raid0
raidctl -C raid1.conf raid1 # configure raid1
raidctl -s raid0 # check config .. looks good
raidctl -I 0 raid0 # initialize components in raid0
raidctl -I 0 raid1 # initialize components in raid1
raidctl -iv raid0 # Initialize parity on raid0
raidctl -iv raid1 # Initialize parity on raid1
# NB: here I partitioned and formatted w/in the
# raid device raid0 (but not raid1)
raidctl -A root raid0 # make raid0 device autoconfigure
raidctl -A yes raid1 # make raid1 device autoconfigure
reboot
raidctl -s raid0 # It comes back correctly
raidctl -s raid1 # It comes back correctly
== OK, now I want a hot space on raid0 too ==
1. summary: I tried just to configure and reconfigure
on raid0 only
raidctl -u raid0 # unconfigure the raid device
vi raid0.conf # added the hot spare sd3f
raidctl -C raid0.conf raid0
raidctl -s raid0 # at this point the system shows
# 3 components and 1 hot spare
# I think I'm done, but ...
raidctl -I 0 raid0 # for completeness
raidctl -iv raid0 # for completeness
raidctl -A yes raid0 # make raid device autoconfigure
raidctl -s raid0 # still looks good
reboot
raidctl -s raid0 # I've lost the hot spare!
2. summary: As with #1, but first I disabled autoconfig
and zeroed the start of the raid device
dd if=/dev/zero of=/dev/rraid0 bs=100b count=10
raidctl -A no raid0 # disable autoconfig
raidctl -u raid0 # unconfigure the raid device
vi raid0.conf # for completeness (unchanged from
# last time. includes hot spare)
raidctl -C raid0.conf raid0
raidctl -s raid0 # at this point the system shows
# 3 components and 1 hot spare
# I think I'm done, but ...
raidctl -I 0 raid0 # for completeness
raidctl -iv raid0 # for completeness
raidctl -A yes raid0 # make raid device autoconfigure
reboot
raidctl -s raid0 # I've lost the hot spare again!
3. summary: I disabled autoconfig, zeroed the beginning
of the raid device, and zeroed the components
completely. It still loses the hot spare!
dd if=/dev/zero of=/dev/rraid0 bs=100b count=10
raidctl -A no raid0 # disable autoconfig
raidctl -u raid0 # unconfigure the raid device
# completely zero the components
dd if=/dev/zero of=/dev/rsd0f bs=100b
dd if=/dev/zero of=/dev/rsd1f bs=300b
dd if=/dev/zero of=/dev/rsd2f bs=300b
dd if=/dev/zero of=/dev/rsd3f bs=300b # even the spare!
vi raid0.conf # for completeness (unchanged from
# last time. includes hot spare)
raidctl -C raid0.conf raid0
raidctl -s raid0 # at this point the system shows
# 3 components and 1 hot spare
# I think I'm done, but ...
raidctl -I 0 raid0 # for completeness
raidctl -iv raid0 # for completeness
raidctl -A yes raid0 # make raid device autoconfigure
reboot
raidctl -s raid0 # I've lost the hot spare!
So, how can I reconfigure the thing? Just about the only
thing I have left to try is to completely zero all the
drives in the system. Where is the autoconfiguration info
being stored?
Problem #2
----------
I tried powering off a drive in an active configuration
(on raid1 if you've read the previous problem) which
had 3 active components and 1 hot spare. The RAIDframe
driver detected the problem, marked the component bad,
but did not start reconstruction to the hot spare. I can
manually fail the component (which is already marked as
failed in the status) and then it starts reconstruction.
Although I can't find any statement of this capability
in the docs, this was also true of hardware RAID systems
I have used in the past (all of which automatically
switched to the hot spare). There is a statement
in the raidctl man page under the "-F" option which reads
"This is one of the mechanisms used to start the
reconstruction process ...". Since the difference between
the "-f" and "-F" option is the use of the hot spare
the capability obviously exists in the code (although
perhaps it is disabled for some reason). At any rate,
since the whole point of the hot spare is to allow the
RAID array to lose multiple disks before a human detects
the problem and replaces the bad drive(s) I can't imagine
that the system doesn't do it.
So, what am I missing? Do I need to somehow enable
the system to use the hot spare? Is there another
option (similar to -A) which sets an option in
the component labels?
Thanks for any help,
Rob Ginn