Subject: Re: URGEND: raid 5 array failed due to power outage
To: Uwe Lienig <uwe.lienig@fif.mw.htw-dresden.de>
From: Greg Oster <oster@cs.usask.ca>
List: port-alpha
Date: 08/26/2004 09:49:47
Uwe Lienig writes:
> Hello Tobias,
> hello Greg,
> hi alpha gurus (for the hardware question)
> =

> Thanks a lot for your answers. Since Grep appended, that it would be wi=
se to =

> re-build the raid in degraded mode I'd like to ask, how I have to =

> reconstruct. My thoughts are as follows:
> =

> $ > raidctl -C /etc/raid0.conf raid0
> $ > raidctl -I 20040825 raid0
> $ > raidctl -f /dev/sd12b raid0

This should be ok.

> After that I would check the disklabel.
> Mount everything readonly, backup the file systems and then bring the r=
aid =

> back to life.
> =

> To bring the raid back to life I would do:
> $ > # comment
> $ > # first reconstruct the sd12b to the spare
> $ > raidctl -F /dev/sd12b raid0
> $ > # then, if necessary, replace sd12b and rebuild the raid
> $ > raidctl -B raid0
> $ > # the raid should be back in normal operation
> Please verify if this procedure would work as expected.

You'd be better off using:

 raidctl -R /dev/sd12b raid0 =


to rebuild back on top of sd12b.  Copyback works, but has some =

serious limitations (e.g. no IO to the RAID set while the copyback is =

happening!) and needs to be replaced.

However: your other email indicates:
> Aug 23 11:35:49 lwfv-fs /netbsd: Kernelized RAIDframe activated
> Aug 23 11:35:49 lwfv-fs /netbsd: root on sd0a dumps on sd0b
> Aug 23 11:35:49 lwfv-fs /netbsd: root file system type: ffs
> Aug 23 11:35:49 lwfv-fs /netbsd: RAIDFRAME: protectedSectors is 64
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd10b being conf=
igured
>  at row: 0 col: 0
> Aug 23 11:35:49 lwfv-fs /netbsd:          Row: 0 Column: 0 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd:          Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 225
> Aug 23 11:35:49 lwfv-fs /netbsd:          Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd10b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd11b being conf=
igured
>  at row: 0 col: 1
> Aug 23 11:35:49 lwfv-fs /netbsd:          Row: 0 Column: 1 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd:          Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 225
> Aug 23 11:35:49 lwfv-fs /netbsd:          Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd11b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd12b being conf=
igured
>  at row: 0 col: 2
> Aug 23 11:35:49 lwfv-fs /netbsd:          Row: 0 Column: 2 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd:          Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 225
> Aug 23 11:35:49 lwfv-fs /netbsd:          Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd12b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd30b being conf=
igured
>  at row: 0 col: 3
> Aug 23 11:35:49 lwfv-fs /netbsd:          Row: 0 Column: 3 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd:          Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 228
> Aug 23 11:35:49 lwfv-fs /netbsd:          Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd30b has a different modfication=
 count
> : 225 228
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd30b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd31b being conf=
igured
>  at row: 0 col: 4
> Aug 23 11:35:49 lwfv-fs /netbsd:          Row: 0 Column: 4 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd:          Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 228
> Aug 23 11:35:49 lwfv-fs /netbsd:          Clean: No Status: 0
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd31b has a different modfication=
 count
> : 225 228
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd31b is not clean!
> Aug 23 11:35:50 lwfv-fs /netbsd: raid0: Component /dev/sd32b being conf=
igured
>  at row: 0 col: 5
> Aug 23 11:35:50 lwfv-fs /netbsd:          Row: 0 Column: 5 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:50 lwfv-fs /netbsd:          Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 228
> Aug 23 11:35:50 lwfv-fs /netbsd:          Clean: No Status: 0
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd32b has a different modfication=
 count
> : 225 228
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd32b is not clean!
> Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME: Configure (RAID Level 5): t=
otal n
> umber of sectors is 179207680 (87503 MB)
> Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME(RAID Level 5): Using 20 floa=
ting r
> econ bufs with head sep limit 10

This RAID set should have *never* configured, and I'm not sure why it =

did.  [time passes]  Ok, the "old config" code has a bug, which is =

all the more reason for everyone to be using the autoconfig code.
[I *really* need to nuke that old code...]

> Aug 23 09:27:35 lwfv-fs last message repeated 6 times
> Aug 23 10:13:04 lwfv-fs syslogd: Exiting on signal 15

Was this a crash, or a reboot, or a hang, or???  (I'm just trying to =

figure out why the mod counters would be out by 3.  I can understand =

them being out by 1, but never by 3 for the scenario you present.)

I'm not sure which way to suggest going right now...  I still need =

more info... =


Later...

Greg Oster