Subject: Re: RAIDframe crash again
To: Kazushi Marukawa (Jam) <jam@pobox.com>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 07/12/2001 20:01:44
Kazushi Marukawa writes:
> Hi,
> 
> My system is crashed and the situation is similar to Chris
> Jones one.  FYI, the message-id of his mail is
> <20010508165041.C6074@mt.sri.com>.
> 
> The real reason is two hard drives failure in a 4 drives
> RAID5 system.  Then, system was crashed.  Is there any way
> to stop this crash? 

No.  (I had a look the other day at trying to make it not panic on a
2-component failure, but didn't get very far :( )

> A copy of messages is below.  This is
> not all, I just grepped it by "raid" keyword.
[snip]
The more interesting bits will be from *before* the first "raid0: IO Error."
In particular, you need to find out *why* it said wd3e and wd1e failed.

> Here is a trace after the crash.  I hope this help some
> developper to fix this.

Thanks for the trace, but the panic on a 2-drive failure is intentional
(or at least is/was to the original RAIDframe writers)

> 
> Both hard drives that raid marked failure are OK with
> manufacture's test program.  Maybe, those are going bad now,
> but it works for now.

Could be cabling/heat/power issues too.  How long have you been running this 
RAID set?  

>  So, I connected only 3 out of 4
> drives and start using them to make a backup.  I configured
> raid5 with -C and did fsck.  FSCK asked me to remove some
> files to fix file system.  I copied those files with a hope
> that only inode is corrupted but data is correct.  After
> fsck, I copied those files into the original place.  System
> crashed again.  Sigh.  However, after that, I mean
> restarting the system and fsck -p, 

Just use 'fsck' (without the -p).  And do it a few times until you get
no more changes/errors.

> I could copy those files
> into the original place.  Here is a trace after this crash.

Hmmm... Did you get a copy of the panic message?  Hard to tell exactly 
why it died here...  

Could you ship me (privately) a copy of your raid config files and of
/var/run/dmesg.boot?  Thanks.

Later...

Greg Oster