NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?



On Sun, 18 Jul 2021 at 00:30, Greg Troxel <gdt%lexort.com@localhost> wrote:
>
>
[snip]
>
> Ah, interesting point.  I find this confusing, because I thought an
> uncorrectable read error would, for disks I've dealt with, cause the
> sector to be marked as permanently failed and pending reallocation.
>
It depends where the failure occurs I expect. A drive could read just
fine, but then a damaged cable may cause enough noise that the data
doesn't always make it to the controller correctly.

> I also didn't realize that wd(4) would issue aother read when there is a
> failure, but maybe that's in zfs glue code.
>
wd has retried for years I think, it certanly used to do that with the
soft RAID code.

Looks to be set at 5 in the source[1], if I'm looking in the right place. :D

I expect if you just use wd devices for ZFS there may be some merit in
setting the retries to 1 and letting ZFS deal with it, it'd stop the
slow I/O, with the effect of ZFS failing the drive.

[snip]
> >>    5 200  140     yes online  positive    Reallocated sector count    0
> >
> > I was expecting to see this value greater than 0 if the drive was
> > failing, is the drive bad or the cabling?
>
> Sectors get marked as failed, and then they actually get reallocated
> when you write.
> I bet after a dd of /dev/zero that will go up.

This is useful to know!. :)

Ian

1. https://github.com/NetBSD/src/blob/05082e19134c05f2f4b6eca73223cdc6b5ab09bf/sys/dev/ata/wd.c#L94


Home | Main Index | Thread Index | Old Index