current-users: Re: Horrible RAIDFrame Crash

Subject: Re: Horrible RAIDFrame Crash
To: Greg Oster <oster@cs.usask.ca>
From: Caffeinate The World <mochaexpress@yahoo.com>
List: current-users
Date: 04/15/2003 12:40:58
--- Caffeinate The World <mochaexpress@yahoo.com> wrote:
> 
> --- Caffeinate The World <mochaexpress@yahoo.com> wrote:
> > 
> > --- Caffeinate The World <mochaexpress@yahoo.com> wrote:
> > I unplugged the SCSI connector from sd0 and booted the system up
> > again.
> > It booted up fine with the failed component errors. So sd1 is fine.
> 
> > 
> > What can I do to further narrow down the problem. Apparantly it's
> sd0
> > and it could be during the write process that caused the Multiple
> > disks
> > error. I get the feeling that if I repeat building sd0 as the
> spare,
> > I'll get the same errors.
> 
> I unplugged the SCSI cable from sd0, boot up the system. Booted up
> fine. Shutdown to single user mode. Plug the SCSI cable back into sd0
> and "scsictl scsibus0 scan any any". It found sd0 fine.
> 
> Tried to get sd0a to hotspare with raid0 again.
> 
> raidctl -a /dev/sd0a raid0
> warning: truncating spare disk /dev/sd0a to 1023872 blocks
> 
> NOTE: sd0a has the same layout and size as sd1a used by raid0. So
> that
> truncating error doesn't make sense.

I checked the disklabel of sd0c and it showed partition a: having
1024000 sectors with offset 0. sd01a is exactly the same. raid0 (which
is a raid1 composing of sd1a and sd0a) has 1023872 sectors with offset
0. Note that 1023872 was given by disklabel raid0 > disklabel.raid0
before. 1024000 - 1023872 is 128, which happens to be raid0's sectors
per track. 

Why is using 1023872? is the 128 reserved for the raid disk label?

> raidctl -vF component0 raid0
> started doing the reconstruction and was at 2% when
> ...fast scrolling errors... then
> 
> recon read failed
> panic: raidframe error at line 1314 file
> /usr/src/sys/dev/raidframe/rf_reconstruct.c
> syncing disks... Multiple disks failed in a single group! Aborting
> I/O
> operation
> 
> Multiple disks failed...operation [repeated 17 times]
> 
> panic raidframe error at line 471 file
> /usr/src/sys/dev/raidframe/rf_states.c

I was curious if sd0a had read write problems. So I changed sd0a from
type RAID to 4.2BSD. newfs sd0a and it went fine. fsck sd0a had no
errors. After that successful test, I changed sd0a type back to RAID in
the disklabel (everything else remain the same).

Trying to get sd0a to be spare for raid0 again:

raidctl -a /dev/sd0a raid0
...truncation warning...
raidctl -vF component0 raid0
...reconstruct at around 7%...
...two quick semi-loud ZZZzzz ZZZzzz sound from the HD...
...crash...reboot...

On a positive note, I finally got a chance to take the alpha down and
replace the cmos battery so it would keep proper time and cmos settings
between long (like 2 min) of being shut off. It also helped with not
having to get into AlphaBIOS for NT each time the system power cycles.
You'd have to go into the cmos and set OpenVMS again, then reboot just
to get it to boot. CR2032 lithium battery cost $2.99 at Kmart and
RadioShack.

Thomas

__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com