Subject: Re: Detecting bad blocks
To: Timothy A. Musson <timothy.musson@zin-tech.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: netbsd-help
Date: 03/06/2004 16:25:44
On Mon, Mar 01, 2004 at 04:02:32PM -0500, Timothy A. Musson wrote:
> We're having a stat() call fail, and due to less than thorough handling of
> return codes we have 4 or 5 candidates as to what the failure is. We will
> be re-formatting the slice with newfs, but I'm trying to get a handle on
> how bad blocks are dealt with; partially to see if that might be part of
> the problem and partially because I'd just like to know more about it, but
> mostly so we can tell management whether or not we need to get a
> replacement drive ready.
> 
> >From the links I've found, it seems that you're not supposed to see any
> indication of bad blocks until the hard drive is really having problems and
> probably needs replaced (because the HD hardware would have been
> automatically remapping bad blocks for awhile).

"It depends"
IDE drives will automatically remap on write, but not on read. So if you're
trying to read a bad block you'll get an error. You have to write to this
block to have the hardware remap is (but then the data will be lost).

On SCSI it's different. Automatic remapping can be turned off (in which case
you'll see error trying to read or write a bad block) or on (but I think
in this case it'll remap on write only - as for read it doesn't have data
to write to the new block). you can explicitely remap a block with
the scsictl reassign command.


> I've also found that there
> is the bad144 command which can read and write some bad sector info, but
> it's unclear to me whether the information it inspects is the same info
> used by the hardware or if bad144 is just for the drivers for the OS. In
> any case, I'm at the same point as the poster of an email that was answered
> by Manuel Bouyer:
> 
> from http://mail-index.netbsd.org/netbsd-users/2001/11/14/0007.html
> >> On NetBSD (i386) I see the bad144 and badsect tools; however, neither of
> >> these appears to do an actual _scan_ of a device for bad blocks.  It
> >
> >No, you need to know the address of the block you want to remap.
> 
> And I'm left with the question, "How do I know when there is a bad block
> (that hasn't been handled by the hardware) and how do I find the address of
> it?" If there is a bad block problem, will there be an explicit error
> message dumped into /var/log/messages saying "bad block" and the block
> number? Or, would the errors not be so obvious and require some digging
> around to find out that the cause is a bad block?

Yes, the kernel will complain about bad blocks. You can have it in
/var/log/messages (or dmesg most of the time). The error message contains
the block number (from the start of the disk).

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--