Subject: Re: Funny -> ATA drive read error
To: None <kilbi@rad.rwth-aachen.de>
From: Charles M. Hannum <abuse@spamalicious.com>
List: netbsd-users
Date: 06/04/2004 15:24:59
On Friday 04 June 2004 08:08, Markus W Kilbinger wrote:
> >>>>> "Manuel" == Manuel Bouyer <bouyer@antioche.eu.org> writes:
> >> On Thu, Jun 03, 2004 at 09:00:22PM +0800, Todd Gruhn wrote:
> >> Jun 3 08:17:01 gandalf /netbsd: wd0: (uncorrectable data error)
> >> Jun 3 08:17:01 gandalf /netbsd: wd0: transfer error, downgrading to
> >> Ultra-DMA mode 1 Jun 3 08:17:01 gandalf /netbsd: wd0(pciide0:0:0):
> >> using PIO mode 4, Ultra-DMA mode 1 (using DMA data transf ers)
> >> Jun 3 08:17:01 gandalf /netbsd: wd0e: error reading fsbn 3034368 of
> >> 3034368-3034383 (wd0 bn 4193568; cn 416 0 tn 4 sn 36), retrying
>
> Manuel> This is specific to ATA drives. "uncorrectable data error"
> Manuel> is one of the few error conditions an ATA drive can
> Manuel> report. This error mean that your drive has a bad block.
>
> I had the same problem with one of our ata drives last days...
>
> Manuel> You can try to write to this block to try to remap it.
>
> ... and could really fix it with such a write attempt! :-)
>
> Strange thing: After the read error occured once no further access
> (read and write) to the erroneous block (and afterwards) is possible
> on a running system, further I only see Input/Output error's for these
> blocks.
>
> So I had to reboot the machine to get the erroneous block accessible
> again. ... and then I have to do the write attempt first to fix the
> problem.
>
> Questions:
>
> Is it normal that once as erroneous detected blocks are permanently
> marked/remembered as failed? (Is this a feature of the disk or of the
> driver? Any {ata,dk}ctl command to change this behavior?)
This is a relatively new "feature," which I am likely to remove soon, because
it causes exactly the problem you mentioned. (It also had another serious
bug that I fixed a few days ago -- it caused I/O to *other* blocks to return
EIO.)
Also, I recently (a few days ago) eliminated the downgrading of transfer modes
on most errors -- it's pointless, and there's also no way to recover from
that without rebooting.