Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

Mr Roooster <> writes:


> The wd driver is retrying, (IIRC it retries 3 times) and suceeding on
> the second or 3rd attempt. (See xfer 338, retry 0, followed by a 'soft
> error corrected' with the same xfer number 10 seconds later. This is
> the retry suceeding).
> This sits below ZFS and therefore ZFS never sees the error. If the
> read failed 3 times you'd probably get a data error in ZFS.


And this would very much explain why the ZFS checksum does not reflect
any problems in the pool.  As far as ZFS is concerned it managed to
succeed in reading the disk, although with some effort.  Had the sector
actually been unreadable completely, I would hope that ZFS in NetBSD
would have noticed and complained.  With a T4 Solaris box I dealt with a
whole back, I noted something simular happening and it meant that the
drive was in the very early stages of failing.  I did simulate a bad
sector once as a test with ZFS in NetBSD to see how it might react and
it worked as expected in that test, but I have not had the "opportunity"
to see a real fail.

Brad Spencer - - KC8VKS -

