netbsd-users: Re: Bad sectors vs RAIDframe

Subject: Re: Bad sectors vs RAIDframe
To: J Chapman Flack <flack@cs.purdue.edu>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: netbsd-users
Date: 06/06/2005 13:06:19

On Mon, Jun 06, 2005 at 11:44:51AM -0500, J Chapman Flack wrote:
> Thor Lancelot Simon wrote:
> > Most IDE drives only spare out sectors on *write* (one must ask: what,
> > exactly, could they do to avoid presenting a read error on read -- and
> 
> If the question wasn't rhetorical, I think the answer is, it's a "read error"
> if the drive had to apply ECC to recover the correct data; then it reassigns
> the block, writes the recovered data to the new block, and returns the
> recovered data to the host.

Right, so, there are two problems here.

First, even if some errors are correctable with ECC, some aren't.  Is it
correct for the drive to automatically spare out on an _uncorrectable_
error?  If it does so, and the host retries the read, it will get back
a block full of zeroes -- which will cause a particularly ugly kind of
data corruption in a parity RAID setup.

Given the limited error-reporting semantics available to IDE or SATA
disks, it's probably actually correct for them to report error and not
spare the sector in this case.  If the host has the right data, let it
write the data back.

The other problem is that some IDE drive firmware is so cheap that it
knows only two states for sectors: okay or error.  So on such drives
it seems to be the case that you're guaranteed that sectors that go
bad, ever at all, will stay that way until you force sparing by
writing them back yourself.

-- 
 Thor Lancelot Simon	                                      tls@rek.tjls.com

"The inconsistency is startling, though admittedly, if consistency is to be
 abandoned or transcended, there is no problem."		- Noam Chomsky