Re: Question about raidframe use

To: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
Subject: Re: Question about raidframe use
From: Greg Oster <oster%cs.usask.ca@localhost>
Date: Sat, 17 Jan 2009 10:45:42 -0600

Brian Buhrow writes:
>       Hello.  I've been a raidframe user for a long time, but I've recently
> come across a situation that I'm not sure I know how to get out of
> gracefully without rebooting.  I'm wondering if  anyone can tell me of
> another way to do what I want.
> 
>       The scenario looks like this:
> 
> 1.  I have a raidframe set up, the type of raidframe is not important for
> purposes of this problem.
> 
> 2.  One component goes bad, so I use
> raidframe -a /dev/newdisk
  ^^^^^^^^^
I suspect you mean 'raidctl' here...

> raidframe -F badcomponent
> to get the spare component online.
> 
> 3.  Once the spare component is reconstructed,  it stays listed in the
> raidframe -s output as a used_spare.  This until the server is rebooted and
> the autoconfig pulls it in as a  full fledged component.

Right.

> 4.  Before I reboot, but after I've reconstructed to the spare, I find I
> need to delete the spare and reconfigure  the underlying drive before
>  to it as a spare again.
> Doing:
> raidframe -f /dev/newdisk
> doesn't work because it says that  /dev/newdisk is not a component of the
> raid set.
> 
> raidframe -f /dev/badcomponent doesn't work either, for the same reason.

Hmm.. I'm not sure what the goal of this step is... I do know that it 
won't let you do it, because /dev/newdisk won't be (currently) marked 
as "optimal" (instead, it's a "used_spare").  The issue is that the 
code can't handle remapping a failed spare to another spare, and so 
hand-failing a spare is disallowed...

>       So, the questions are:
> 
> 1.  Can the raidframe code be changed topromote  used_spares to full
> components once the reconstruction is complete?  (I realize this blurs the
> line between spares and full components, but right now, there is no
> auto-reconstruction mechanism, and there doesn't seem to be a way of
> failing a used_spare.)

It can be changed... it's just a not-so-simple matter of programming 
that I've not gotten to yet... (It basically requires re-writing a 
whole mess of code related to how the spares are handled...)

> 2.  Failing that, can the raiddlookup code be changed to permit the manual
> failing of used_spares?

I think I looked at doing that one time and discovered that things 
weren't setup to handle that case... (e.g. say you fail component '2' 
and rebuild to spare '5'.  Then you fail spare '5' and rebuild to 
spare '6'.  The issue is that RAIDframe knows how to map from '5' back 
to '2, but it doesn't now how to map from '6' to '5', and then from '5'
to '2' (or any other sort of transitive steps))

>       I like option 1 better, since it implies that you could go through an
> endless cycle of sparing and failing disks without rebooting and always end
> up with the ability to  manipulate components that are full components,
> rather than  used_spares.
> 
>       Am I completely missing something here?

Nope :-}  Fixing this one has been on my TODO list for ages, but last 
I looked there were some fairly invasive changes required...  And I 
agree that being able to do an endless cycle of sparing and failing 
would be ideal...  (I'll have another look at what is involved in 
changing this... If I recall, some things got a lot simpler, at the 
expense of no longer being able to do a 'copyback' (which I don't 
think anyone uses anyway!!))

Later...

Greg Oster

References:
- Question about raidframe use
  - From: Brian Buhrow

Prev by Date: recent -current usr.bin/make changes break pkgsrc again
Next by Date: reproducible crash with amd64/SMP (with file system trouble)
Previous by Thread: Question about raidframe use
Next by Thread: Yikes!!! 8537 missing files in DESTDIR
Indexes:

Home | Main Index | Thread Index | Old Index