Subject: Re: URGEND: raid 5 array failed due to power outage
To: Greg Oster <oster@cs.usask.ca>
From: Uwe Lienig <uwe.lienig@fif.mw.htw-dresden.de>
List: port-alpha
Date: 08/26/2004 18:12:11
Hello Greg,
------
snip
------
> > To bring the raid back to life I would do:
> > $ > # comment
> > $ > # first reconstruct the sd12b to the spare
> > $ > raidctl -F /dev/sd12b raid0
> > $ > # then, if necessary, replace sd12b and rebuild the raid
> > $ > raidctl -B raid0
> > $ > # the raid should be back in normal operation
> > Please verify if this procedure would work as expected.
>
> You'd be better off using:
>
>  raidctl -R /dev/sd12b raid0

So, the correct way would be:
$ > # comment
$ > # first reconstruct the sd12b to the spare
$ > raidctl -F /dev/sd12b raid0
$ > # then, if necessary, replace sd12b and rebuild the raid
$ > raidctl -R /dev/sd12b raid0
$ > # the raid should be back in normal operation

>
> to rebuild back on top of sd12b.  Copyback works, but has some
> serious limitations (e.g. no IO to the RAID set while the copyback is
> happening!) and needs to be replaced.
>

------
snip
------

> However: your other email indicates:
------
snip
------

> > count
> >
> > : 225 228
> >
> > Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd32b is not clean!
> > Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME: Configure (RAID Level 5):
> > total n umber of sectors is 179207680 (87503 MB)
> > Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME(RAID Level 5): Using 20
> > floating r econ bufs with head sep limit 10
>
> This RAID set should have *never* configured, and I'm not sure why it
> did.  [time passes]  Ok, the "old config" code has a bug, which is
> all the more reason for everyone to be using the autoconfig code.
> [I *really* need to nuke that old code...]
May the raid be converted to autoconfigure by giving

$ > raidctl -A yes raid0

even when the raid has been used and data been copied to the raid. 
So I would then immediately convert the raid to autoconfiguration. I assume, 
that /etc/rc.d/raidframe will deal with autoconfigured raid sets.
>
> > Aug 23 09:27:35 lwfv-fs last message repeated 6 times
> > Aug 23 10:13:04 lwfv-fs syslogd: Exiting on signal 15
>
Every time the system was rebooted normally. The system never became unstable 
or unresponsive.

> Was this a crash, or a reboot, or a hang, or???  (I'm just trying to
> figure out why the mod counters would be out by 3.  I can understand
> them being out by 1, but never by 3 for the scenario you present.)
>
> I'm not sure which way to suggest going right now...  I still need
> more info...
Which information would you need? I'd try to get them to you.

>
> Later...
>
> Greg Oster

Thanks a lot.

-- 


Uwe Lienig
----------
fon: (+49 351) 462 2780
fax: (+49 351) 462 3476
mailto:uwe.lienig@fif.mw.htw-dresden.de

Forschungsinstitut Fahrzeugtechnik
<http://www.fif.mw.htw-dresden.de>
parcels: Gutzkowstr. 22, 01069 Dresden 
letters: PF 12 07 01,    01008 Dresden

Hochschule für Technik und Wirtschaft Dresden (FH)
Friedrich-List-Platz 1, 01069 Dresden