Subject: Re: RAID kills the machine
To: Greg Oster <oster@cs.usask.ca>
From: Robert Elz <kre@munnari.OZ.AU>
List: current-users
Date: 08/26/2002 23:57:11
    Date:        Fri, 23 Aug 2002 08:39:27 -0600
    From:        Greg Oster <oster@cs.usask.ca>
    Message-ID:  <20020823143927.6C37255C02@cs.usask.ca>

[Tobias Schuepp <netbsd@schuepp.net>]
  | > I can reproduce that. Does it belong to my disks or is it a bug in
  | > raidframe?

Neither of those, it is the ahc driver.

I sent in a PR (kern/11180) in October 2000 about this, and Greg Woods
moans about it whenever he gets the chance (this mail probably being
an invitation for more) which might be why no-one can be bothered to
fix it...

There are probably other PRs as well.

Whenever multiple drives are active on the controller at the same time,
there's a possibility the controller will hang (get into an obscure state,
or something).   Raid is a good way (the best way probably) to really
get the controller busy on multiple drives simultaneously.

The PR also reported a raidframe problem that you (Greg) fixed a day or
two later, so that part of the PR is irrelevant now.

  | It's a problem with your disks, cables, SCSI termination,
  | or something related to the SCSI bus.

Well, I guess the last of those counts, in that the driver is
"something related to the scsi bus"...

  | The disks are having serious enough problems that RAIDframe 
  | thinks both have failed, and (currently) it just stops the kernel.

Yes, it isn't a raidframe problem (though sometime in the future, having
raid act just like a drive failure would be better - just return i/o
error).   In many cases that will often cause the system to panic pretty
soon after, but it should be possible to have a raidframe where one of
the components is an underlying raidframe, and in that case, all that
should happen is that the higher one should see a component failure.

kre