Re: kern/54289 hosed my RAID. Recovery possible?

To: jdbaker%consolidated.net@localhost
Subject: Re: kern/54289 hosed my RAID. Recovery possible?
From: Greg Oster <oster%netbsd.org@localhost>
Date: Thu, 15 Aug 2019 09:30:04 -0600

On Thu, 15 Aug 2019 09:07:46 -0500
jdbaker%consolidated.net@localhost wrote:

> On 2019-08-15 00:03, jdbaker%consolidated.net@localhost wrote:
> > The SiI3214 SATALink card suffers from the identify problem in
> > netbsd-9 and -current (PR kern/54289).
> > 
> > Booting a netbsd-9 kernel, the drives failed to identify which
> > caused RAIDframe to mark the 4 drives on that card (of 8) in my
> > RAID as FAILED.
> > Rebooting netbsd-8, the drives identify properly, but are still
> > marked as
> > FAILED.
> > 
> > Is there any way to unmark them so the raid will configure and
> > recover? Normally 'raidctl -C' is used during first time
> > configuration. Could it be used to force configuration, ignoring
> > the FAILED status?  Would the RAID
> > be recoverable with parity rebuild afterwards?  
> 
> This seems to have worked.  The disks not being correctly 
> identified/attached
> under netbsd-9 apparently had them recorded as failed on the
> components that
> did attach (on the machine's on-board intel ahcisata ports).
> Rebooting netbsd-8, although the drives identified and attached
> properly, they were
> still considered failed components.
> 
> Being a multiple-disk failure is usually fatal to a RAID, but the 
> components
> weren't actually failed.  Un-configuring with 'raidctl -u' then
> forcing a
> config with 'raidctl -C /path/to/config' did not show any fatal
> errors and
> subsequent 'raidctl -s' showed all component labels (w/serial number) 
> intact.
> Parity rewrite took a long time.
> 
> Afterwards, 'gpt show raid0d' and 'dkctl raid0d listwedges' showed 
> things to
> be intact that far.  Rebooting the machine, the RAID properly 
> autoconfigured.
> 'fsck' reported the filesystem as clean (since it never got mounted 
> after the
> failed reboot into netbsd-9).  An 'fsck -f' run is in progress.

In general, with this sort of 'larger' set of failed components, you
should be OK.  There are a couple of scenarios for the different RAID
sets you might have configured:
 1) The components that 'failed' were not sufficient to fail the RAID
 set (i.e. it was just 'degraded').  In this case, the surviving
 components still have your data, but in degraded mode.  Rebuild the
 'failed' component, and you're good-to-go.

 2) The components that 'failed' were enough to completely fail the RAID
 set upon configuration.  In this case, the RAID set would not
 configure, and no data would be written to any of the components (save
 for the updating of the component labels).  In this case you can use
 'raidctl -C' to reconstruct the RAID set and be comfortable that your
 data is still intact (given that there wasn't actually a real failure,
 and no data was written to the RAID set).  Yes, a parity rebuild will
 be needed, and it will be a NOP (but it doesn't know that :) ).

The only place this gets tricky is if the RAID set does get configured
and mounted -- in that case you don't want to use 'raidctl -C', as data
on the surviving components will be out-of-sync with the failed
components.  In this case you're better off rebuilding in-place.

Later...

Greg Oster

References:
- kern/54289 hosed my RAID. Recovery possible?
  - From: jdbaker
- Re: kern/54289 hosed my RAID. Recovery possible?
  - From: jdbaker

Prev by Date: Re: kern/54289 hosed my RAID. Recovery possible?
Next by Date: build.sh flag "-u"
Previous by Thread: Re: kern/54289 hosed my RAID. Recovery possible?
Next by Thread: Re: kern/54289 hosed my RAID. Recovery possible?
Indexes:

Home | Main Index | Thread Index | Old Index