Subject: Re: URGEND: raid 5 array failed due to power outage
To: Uwe Lienig <uwe.lienig@fif.mw.htw-dresden.de>
From: Greg Oster <oster@cs.usask.ca>
List: port-alpha
Date: 08/26/2004 10:56:07
Uwe Lienig writes:
> Hello Greg,
> ------
> snip
> ------
> > > To bring the raid back to life I would do:
> > > $ > # comment
> > > $ > # first reconstruct the sd12b to the spare
> > > $ > raidctl -F /dev/sd12b raid0
> > > $ > # then, if necessary, replace sd12b and rebuild the raid
> > > $ > raidctl -B raid0
> > > $ > # the raid should be back in normal operation
> > > Please verify if this procedure would work as expected.
> >
> > You'd be better off using:
> >
> >  raidctl -R /dev/sd12b raid0
> =

> So, the correct way would be:
> $ > # comment
> $ > # first reconstruct the sd12b to the spare
> $ > raidctl -F /dev/sd12b raid0

This step isn't needed...  If sd12b needs to be replaced, just =

replace it, and then do the following step.

> $ > # then, if necessary, replace sd12b and rebuild the raid
> $ > raidctl -R /dev/sd12b raid0
> $ > # the raid should be back in normal operation

[snip]
> =

> > > count
> > >
> > > : 225 228
> > >
> > > Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd32b is not clean!
> > > Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME: Configure (RAID Level 5=
):
> > > total n umber of sectors is 179207680 (87503 MB)
> > > Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME(RAID Level 5): Using 20
> > > floating r econ bufs with head sep limit 10
> >
> > This RAID set should have *never* configured, and I'm not sure why it=

> > did.  [time passes]  Ok, the "old config" code has a bug, which is
> > all the more reason for everyone to be using the autoconfig code.
> > [I *really* need to nuke that old code...]
> May the raid be converted to autoconfigure by giving
> =

> $ > raidctl -A yes raid0
> =

> even when the raid has been used and data been copied to the raid. =


Yes.

> So I would then immediately convert the raid to autoconfiguration. I as=
sume, =

> that /etc/rc.d/raidframe will deal with autoconfigured raid sets.

autoconfigured raid sets will get configured by the kernel.  If the =

parity needs checking /etc/rc.d/raidframeparity will do that.

> >
> > > Aug 23 09:27:35 lwfv-fs last message repeated 6 times
> > > Aug 23 10:13:04 lwfv-fs syslogd: Exiting on signal 15
> >
> Every time the system was rebooted normally. The system never became un=
stable
> or unresponsive.

Hmmmmmmmmmm....  =

 =

> > Was this a crash, or a reboot, or a hang, or???  (I'm just trying to
> > figure out why the mod counters would be out by 3.  I can understand
> > them being out by 1, but never by 3 for the scenario you present.)
> >
> > I'm not sure which way to suggest going right now...  I still need
> > more info...
> Which information would you need? I'd try to get them to you.

Here's what I'm thinking has happened.  Please correct any part of it =

that might be wrong:

1) At this point:

 Aug 22 20:00:00 lwfv-fs syslogd: restart

all components were fine, parity was up-to-date, and all mod_counters =

were correct.

2) On this reboot:

 Aug 23 10:13:04 lwfv-fs syslogd: Exiting on signal 15
 Aug 23 11:35:46 lwfv-fs syslogd: restart

component labels didn't get written to sd10b, sd11b, and sd12b.  This is =

based on the previous errors to those 3 disks:

 Aug 23 00:35:45 lwfv-fs /netbsd: sd10(asc1:0:0:0): unrecognized MESSAGE;=
 sending REJECT
 Aug 23 00:35:45 lwfv-fs /netbsd: asc1: unexpected disconnect; sending RE=
QUEST SENSE
 Aug 23 00:35:55 lwfv-fs /netbsd: sd12(asc1:0:2:0): unrecognized MESSAGE;=
 sending REJECT
 Aug 23 00:35:55 lwfv-fs /netbsd: asc1: unexpected disconnect; sending RE=
QUEST SENSE
 Aug 23 00:36:45 lwfv-fs /netbsd: sd11(asc1:0:1:0): asc1: timed out [ecb =
0xfffffc000189e1b0 (flags 0x1, dleft 4000, stat 0)], <state 1, nexus 0x0,=
 phase(l 10, c 100, p 3), resid 0, msg(q 0,o 0) >
 Aug 23 00:36:55 lwfv-fs /netbsd: sd11(asc1:0:1:0): asc1: timed out [ecb =
0xfffffc000189e318 (flags 0x1, dleft 4000, stat 0)], <state 1, nexus 0x0,=
 phase(l 10, c 100, p 3), resid 0, msg(q 0,o 0) >

3) There is also indications that data was not flushed successfully to =

these drives:
 =

 Aug 23 11:35:50 lwfv-fs /netbsd: /dev/raid0b: file system not clean (fs_=
clean=3D4); please fsck(8)
 Aug 23 11:35:51 lwfv-fs /netbsd: /dev/raid0b: lost blocks 0 files 0
 Aug 23 11:35:51 lwfv-fs /netbsd: /dev/raid0d: file system not clean (fs_=
clean=3D4); please fsck(8)
 Aug 23 11:35:51 lwfv-fs /netbsd: /dev/raid0d: lost blocks 0 files 0

Had the system been shutdown absolutely cleanly, those filesystems should=
 have =

been clean!

But at this point, the data and parity will be as close to "in sync" as y=
ou =

can get, unless RAIDframe logged component failures to the console =

that didn't make it into the logs.  (this is the sort of "missing =

information" that I'm referring to..)

4) At this time:

Aug 23 11:36:00 lwfv-fs /netbsd: raid0: IO Error.  Marking /dev/sd12b as =
failed.
Aug 23 11:36:00 lwfv-fs /netbsd: raid0: node (Rod) returned fail, rolling=
 backward

we do know that the RAID set thought sd12b was in trouble, and it's =

possible that there is good data on the other disks that is not reflected=
 =

in the contents of sd12b.

5) However: sd10 and sd11 have grief at this same time.  What I don't =

know is if the machine paniced here, or if it did manage to write =

data to sd10b and sd11b.

6) There's actually lots of room in here for data loss, since we =

have:
 a) filesystems not getting flushed as noted in 3) above.
 b) filesystem data being partially flushed as noted in 3) above.
 c) filesystems perhaps being used when not fscked as noted in 3) =

above.
 d) IO happening in 4) where we don't know if sd10b or sd11b received =

data (and where other components may have received data!)

Whether it did or didn't manage to write data to sd10b and sd11b, =

considering sd12b as 'failed' will probably get you the closest to =

recovering most of your data.  =


I'd go with:
 raidctl -C /etc/raid0.conf raid0
 raidctl -I 20040825 raid0
 raidctl -f /dev/sd12b raid0
 raidctl -R /dev/sd12b raid0
 and then do filesystem checks and stuff.

I think we can move this to private mail now...

Later...

Greg Oster