Subject: Re: Bad sectors vs RAIDframe
To: Stephen Borrill <netbsd@precedence.co.uk>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: tech-kern
Date: 06/08/2005 13:00:51
On Wed, Jun 08, 2005 at 05:40:57PM +0100, Stephen Borrill wrote:
> On Mon, 6 Jun 2005, Thor Lancelot Simon wrote:
> >We got a bad run of Samsung Spinpoint drives that we unfortunately
> >installed in NetBSD Foundation servers about a year ago.  I have had
> >to recover several of them (all in 2-way RAIDframe mirrors) by using
> >dd to copy the data from the corresponding sectors on one drive over
> >the bad sectors of the other, often doing this in both directions to
> >recover from multi-drive failures within a set.  Since then, RAIDframe
> >has been changed so that it retries on disk error before failing a
> >component, and never fails components from non-redundant sets -- so a
> >newer kernel may let you get somewhere with data recovery, too.
> 
> I'm guessing these changes (or at least the second half) are:
> 
> http://mail-index.netbsd.org/source-changes/2004/01/02/0069.html

I don't think that's all of it -- it looks too early.  Greg?

I think the changes are in the 2.0 branch _now_ but I don't think
they were in it when 2.0 was built and released.

> With 1.6.2, a read error causes component failure. As the read is not 
> retried a successfully ECC corrected sector will not be spotted. If you 
> spot this in time, initiating a rewrite will generally be OK as upon a 
> write failure it'll map in a new sector. It will happily fail all 
> components in an array and then panic. In this respect, having a 
> RAIDframe RAID 1 mirrored set is actually significantly worse than 
> having a single disc (if you fail to spots failures quickly).

That's true for 1.6.2, at least.  Actually, with the change to never
fail a component of a non-redundant set due to disk error, simply
telling the set to rebuild will issue the sector writes necessary
to fix the problem -- unless the rebuild fails because it can only
move the data one way, and it can't read some of it from the "from"
component (whichever read errored last); working by hand you can do
the right thing, which is immediately upon read error dd the data
from the _other_ half of the mirror back to the half that errored.

RAIDframe could clearly automatically DTRT in almost every case
like this -- "regenerate the data from parity and write-back" is
the same as "read from other half of mirror and write-back" but
it's hard to see exactly how to make it do so.  The internals of
RAIDframe scare me.

Thor