Subject: Detecting bad blocks
To: None <netbsd-help@netbsd.org>
From: Timothy A. Musson <timothy.musson@zin-tech.com>
List: netbsd-help
Date: 03/01/2004 16:02:32
We're having a stat() call fail, and due to less than thorough handling of
return codes we have 4 or 5 candidates as to what the failure is. We will
be re-formatting the slice with newfs, but I'm trying to get a handle on
how bad blocks are dealt with; partially to see if that might be part of
the problem and partially because I'd just like to know more about it, but
mostly so we can tell management whether or not we need to get a
replacement drive ready.

From the links I've found, it seems that you're not supposed to see any
indication of bad blocks until the hard drive is really having problems and
probably needs replaced (because the HD hardware would have been
automatically remapping bad blocks for awhile). I've also found that there
is the bad144 command which can read and write some bad sector info, but
it's unclear to me whether the information it inspects is the same info
used by the hardware or if bad144 is just for the drivers for the OS. In
any case, I'm at the same point as the poster of an email that was answered
by Manuel Bouyer:

from http://mail-index.netbsd.org/netbsd-users/2001/11/14/0007.html
>> On NetBSD (i386) I see the bad144 and badsect tools; however, neither of
>> these appears to do an actual _scan_ of a device for bad blocks.  It
>
>No, you need to know the address of the block you want to remap.

And I'm left with the question, "How do I know when there is a bad block
(that hasn't been handled by the hardware) and how do I find the address of
it?" If there is a bad block problem, will there be an explicit error
message dumped into /var/log/messages saying "bad block" and the block
number? Or, would the errors not be so obvious and require some digging
around to find out that the cause is a bad block?

TIA,
Tim