Subject: Re: Dead RAID
To: Paul Mather <paul@gromit.dlib.vt.edu>
From: Greg Oster <oster@cs.usask.ca>
List: port-pmax
Date: 08/21/2000 13:29:26
Paul Mather writes:
> I have a RAID 5 on a 5000/240 running NetBSD/-pmac -current 1.4U built
> as of March 10th.  The RAID 5 consisted of three 406 MB RZ25s, sd1, sd2,
> and sd3.  The sd2 drive died (I wasn't there, so I don't know what
> caused the failure.)
> 
> I have two of the three RAID 5 drives still working, but whenever I do
> anything that accesses the RAID set, e.g., raidctl -s raid0, the system
> panics with a TLB miss and the system reboots.

Ick.  Are you using the auto-configuration stuff, or just 'raidctl -c'
at boot?  Is there any way you can get a 'trace' to see where it is when 
it's dying?  (If it just reboots right away, probably not :( )

> Do I need to put another sd2 drive in before trying to see the state of
> the RAID? 

No, you shouldn't need to.

> It was my impression that it could at least limp along in
> degraded mode before the entire RAID became unusable.

Yes, that is how it is supposed to work.

>  AFAIK, the parity
> was clean up until the drive death, as I put in a "raidctl -P raid0"
> into the /etc/daily script, and always remember it reporting clean.
> 
> Anyone have any advice?  I can likely get another drive (a 634 MB
> RZ56) to reconstruct onto, or maybe even dredge up another RZ25.
> 
> Are the panics due to not having an sd2, or is it a kernel problem?

Sounds like a kernel bug to me... we just need to figure out if it's already 
been fixed for 1.5, or if it's still there...

Later...

Greg Oster