Subject: Re: Funny -> ATA drive read error
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Charles M. Hannum <abuse@spamalicious.com>
List: netbsd-users
Date: 06/04/2004 18:37:42
On Friday 04 June 2004 16:37, Manuel Bouyer wrote:
> On Fri, Jun 04, 2004 at 03:24:59PM +0000, Charles M. Hannum wrote:
> > This is a relatively new "feature," which I am likely to remove soon,
> > because it causes exactly the problem you mentioned.  (It also had
> > another serious bug that I fixed a few days ago -- it caused I/O to
> > *other* blocks to return EIO.)
>
> Note that I'm not the one who implemented this.
> The fact that it prevent writes is not a feature, it's a bug in the way
> it's implemented (the test for read is misplaced).

No, it's not anywhere near that simple.

1) It would need to *remove* the bad block entry when a sector is rewritten.

2) Right now, it can mark a large range (up to MAXPHYS) as "bad," but in=20
reality it may only be one sector that's bad.  This is partly because the=20
code was changed a while back to only switch to single-sector I/O after=20
multiple errors.  It can also mark too *small* a range as "bad," thereby=20
causing it to miss the entry later.

3) It doesn't scale at all.  It doesn't even *try* to scale.

4) As I've said before, the drive does this kind of defect management itsel=
f=20
=2D- and generally much better.  The only point I see here is to work aroun=
d=20
the fact that the driver will wedge in busy-wait loops and cause the system=
=20
to freeze up when it's trying to access a bad block (that is, the drive is=
=20
doing defect management).  This is a bug and should be fixed, but not in th=
is=20
way.

> I don't remember the exact details that caused this to be implemented; you
> should probably ask the author.
>
> > Also, I recently (a few days ago) eliminated the downgrading of transfer
> > modes on most errors -- it's pointless, and there's also no way to
> > recover from that without rebooting.
>
> I've seen on many occasions, with different hardware, that downgrading
> would cause the errors to dissapear (even "ID not found" or "uncorrectable
> data error" types of errors).=A0I agree the hardware was flacky,
> but downgrading at last allowed the install to complete.

I sincerely doubt that downgrading the transfer mode is actually what "fixe=
d"=20
it.  I also did a web search and a PR search, and could find no evidence of=
=20
cases like this -- although that's not conclusive.

The point remains that downgrading on an actual bad block -- especially on=
=20
something like a DVD-ROM -- is just plain wrong.  Downgrading will not fix=
=20
it, and now you've completely screwed performance until you reboot.