Subject: "disk_badsecinfo" doesn't do it (was: problem with wd error handling?)
To: None <tech-kern@NetBSD.org>
From: Matthias Drochner <M.Drochner@fz-juelich.de>
List: tech-kern
Date: 08/16/2004 20:05:01
Having just experienced a single bad sector the first time
(instead of getting a complete drive failure within hours),
I had occasion to look at the various mechanisms for
bad sector exclusion and remapping in NetBSD.
I found that the software bad sector list implemented in
the "wd" driver isn't helpful at all. In contrary, it thwarts
the internal remapping capabilities of the drive and thus
makes things worse.

What happened to me was that a sector with valid data became
unreadable. (This is very likely the only case where one
will notice sector failures at all -- on writes the drive will
remap internally, and if there are no ressources for remapping
left and one gets a hard write error, we are lost anyway. (I'm
leaving out bad144 here because it needs special preparation.))
The "wd" driver -if the error handling does as it is supposed
to- add the sector to a software list. All further read and
write attemts are short-circuited and return EIO immediately.

So the data in that sector are lost. Now I could be lucky
and the data were temporary anyway, the file gets overwritten
eventually. Or someone who knows what he is doing (tm)
deliberately overwrites the sector in question to incite the
drive do remap it internally.
Since that sector in in the "bad" list I can't do that. Without
that special handling, it would just work.

best regards
Matthias