Subject: Re: raidframe: re-mirroring
To: Louis Guillaume <lguillaume@berklee.edu>
From: Greg Oster <oster@cs.usask.ca>
List: netbsd-users
Date: 07/24/2004 14:48:52
Louis Guillaume writes:
> Hello,
> 
> A while ago I had a raid-1 component fail on me. There was some kind of
> DMA error reported on the console and the machine was hung. I suspect
> this may have been a driver error, as the drive worked solidly until an
> upgrade of the kernel, after which this error came up almost
> immediately. The array (of 1) has been working flawlessly since failing
> the drive. The other drive continues to be there, spinning as a "failed
> component."
> 
> It's now a couple kernel builds later and I'd like to try bringing that
> disk back into the fold. The question is: how does raidframe know which
> drive to consider having the correct data?

There is a 'modification count' which gets updated at "appropriate 
times".  This count will be higher on components holding the most 
recent (most correct) data.  

> For instance - if there is a file on the failed drive that has since
> been deleted from the filesystem. Will that file be merged back in to
> the system? 

No.  RAIDframe doesn't know anything about individual bits of data on 
the components.

> Or will the data on that disk be completely condemned and
> the mirror rebuilt from the only known good disk?

Yes.

> Is there a way to "initialize" a failed component so there is no danger
> of merging in unwanted data?

You can use "dd" to push zeros to it, if you'd like.  The rebuild is 
guaranteed to take the longest possible amount of time if you do that 
though.
 
> This brings up another point. Say I wanted to upgrade the system. I'd
> guess its a good idea to "break the mirror", i.e. fail one component and
> upgrade to the other.
> 
> If the upgrade works then bring the failed component back in (provided
> it's data will be condemned).
> 
> If it doesn't, revert to having the failed component as the prime disk.
> How can this be done if it's failed?

This gets quite tricky -- you'd have to create a one-sided mirror, 
and force the configuration manually.  It's even worse if you're 
using said RAID set for /.
 
> The confusion is: How does raidframe know which disk has the data you
> want to keep in a raid-1 situation?

Each component in a RAID set has a component label.  In that 
component label is enough information to figure out what other 
components belong with that given component to configure a RAID set. 
One part of the component label is the "mod counter".  When a component 
is detected as "failed", one of the first things that happens is this 
"mod counter" is incremented and written out to *only the good component*.
If the system should happen to be rebooted at that point, RAIDframe 
will detect that even the failed component belongs to the set, but will 
know that it's failed because of it's lower "mod counter".  When you then
tell it to rebuild that component, it will synchronize the component with 
the good components.

> How can you tell it to rebuild from
> one drive and not the other?

If you have drive A ('primary') and drive B ('mirror'), and drive B fails, 
there are two cases here:
 1) You want the contents of A to be mirrored on B.  This is the 
regular case.  Just use "raidctl -R", and things are easy.

 2) You want the contents of B to be mirrored on A.  This is not the 
regular case.  If this is *really* what you want, there are a couple 
of ways to do this:
   a) you could to do a new configuration ('raidctl -C') with drive B listed 
before drive A in the 'disks' section of the config file. 
   b) you could do a new configuration ('raidctl -C') with A being 
"absent" and drive B listed second.  (look for 'absent' in a 2.0 or 
newer man-page).
   c) You could remove drive A from the system, and boot with only 
drive B.  With 'autoconfig' in use, then B will be the "only known 
drive" for that RAID set, and will be the one considered "good".  
Assuming you can physically hot-add drive A, you can then use 
'raidctl -a' to hot-add it, and reconstruct from B onto A.
   d) There are probbably some other tricks too, including booting 
with only drive B, and keep rebooting until the modification counter 
for B is higher than that for A.  Then put A back into the system, 
and it will then be marked as "failed".

Right now, if you're serious about convincing RAIDframe that it's 
wrong, it can be done, but it's going to require overriding a whole 
bunch of safety-checks... (and if one isn't careful, is a good way of 
losing data..)

Later...

Greg Oster