Subject: Re: Funny -> ATA drive read error
To: None <kilbi@rad.rwth-aachen.de>
From: Charles M. Hannum <abuse@spamalicious.com>
List: netbsd-users
Date: 06/04/2004 15:24:59
On Friday 04 June 2004 08:08, Markus W Kilbinger wrote:
> >>>>> "Manuel" == Manuel Bouyer <bouyer@antioche.eu.org> writes:
>     >> On Thu, Jun 03, 2004 at 09:00:22PM +0800, Todd Gruhn wrote:
>     >> Jun  3 08:17:01 gandalf /netbsd: wd0: (uncorrectable data error)
>     >> Jun  3 08:17:01 gandalf /netbsd: wd0: transfer error, downgrading to
>     >> Ultra-DMA mode 1 Jun  3 08:17:01 gandalf /netbsd: wd0(pciide0:0:0):
>     >> using PIO mode 4, Ultra-DMA mode 1 (using DMA data transf ers)
>     >> Jun  3 08:17:01 gandalf /netbsd: wd0e: error reading fsbn 3034368 of
>     >> 3034368-3034383 (wd0 bn 4193568; cn 416 0 tn 4 sn 36), retrying
>
>     Manuel> This is specific to ATA drives. "uncorrectable data error"
>     Manuel> is one of the few error conditions an ATA drive can
>     Manuel> report. This error mean that your drive has a bad block.
>
> I had the same problem with one of our ata drives last days...
>
>     Manuel> You can try to write to this block to try to remap it.
>
> ... and could really fix it with such a write attempt! :-)
>
> Strange thing: After the read error occured once no further access
> (read and write) to the erroneous block (and afterwards) is possible
> on a running system, further I only see Input/Output error's for these
> blocks.
>
> So I had to reboot the machine to get the erroneous block accessible
> again. ... and then I have to do the write attempt first to fix the
> problem.
>
> Questions:
>
> Is it normal that once as erroneous detected blocks are permanently
> marked/remembered as failed? (Is this a feature of the disk or of the
> driver? Any {ata,dk}ctl command to change this behavior?)

This is a relatively new "feature," which I am likely to remove soon, because 
it causes exactly the problem you mentioned.  (It also had another serious 
bug that I fixed a few days ago -- it caused I/O to *other* blocks to return 
EIO.)

Also, I recently (a few days ago) eliminated the downgrading of transfer modes 
on most errors -- it's pointless, and there's also no way to recover from 
that without rebooting.