Subject: Re: Extension of fsync_range() to permit forcing disk cache flushing
To: Bill Studenmund <firstname.lastname@example.org>
From: Daniel Carosone <email@example.com>
Date: 12/17/2004 10:41:12
Content-Type: text/plain; charset=us-ascii
On Thu, Dec 16, 2004 at 01:56:02PM -0800, Bill Studenmund wrote:
> Also, you're implicitly assuming that disks don't fail.=20
> And since disks fail (even in RAID), all we have to do is make sure
> that the cache failure probability is less than the disk drive
> failure probability, and then the cache doesn't matter.
These things don't seem to be entirely independant, however. Herein
lies the other reason I disable write cache.
As I've noted several times previously, I have a collection of disks
(across a number of manufacturers) at home that "failed" in production
machines with bad blocks that didn't get remapped, and were replaced.
I salvaged the "junk" ones for an experiment. Overwriting those disks
with dd and write cache on left the blocks bad and unreadable. Turning
off write cache and overwriting again meant the drive remapped the
sectors on write, and the disks came up clean. I use them for scratch
space rather than truly 'critical' data, but I've never had another
problem with a single one of them since.
Only one disk (of about a dozen) ever failed to recover this way, and
it was very seriously screwed to start with (and I think cooked
because of a fan failure, as well).
This anecdotal evidence suggests at least these drives don't do read
validation, or don't have enough time to do remapping when the cache
is full, or something similar when write cache is enabled. Whatever
is actually going on, it has been enough for me to decide I don't
trust the drives with write cache on.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)
-----END PGP SIGNATURE-----