tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: fsync error reporting



David Holland <dholland-tech%netbsd.org@localhost> writes:

>  > > everything that process wrote is on disk,
>  > 
>  > That is probably unattainable, since I've seen it plausibly asserted
>  > that some disks lie, reporting that writes are on the media when this
>  > is not actually true.
>
> Indeed. What I meant to say is that everything has been sent to disk,
> as opposed to being accidentally skipped in the cache because the
> buffer was busy, which will currently happen on some of the fsync
> paths.
>
> That's why flushing the disk-level caches was a separate point.

(ignoring errors as I have no objection to what you proposed and
clarified with mouse@)

Maybe I'm way off in space, but I'd like to see us be careful about

  1) operating system has a succcessful return from a write transaction to
  a disk controller (perhaps via a controller that has a write-back
  cache)

  2) operating system has been told by the controller that the write has
  actually completed to stable storage (guaranteed even if OS crashes or
  power fails, so actually written or perhaps in battery-backed cache)

  A) for stacked filesystems like raid, cgd, and for things like NFS,
  there's basically and e2e ack of the above condition.

POSIX is of course weasely about this.  But it seems obvious that if you
call fsync, you want the property that if there is a crash or power
failure (but not a disk media failure :-) that your bits are there,
which is case 2.  Case 1 is only useful in that files could remain in OS
cache for a long time, and there is a pretty good but not guaranteed
notion that once in device writeback cache they will get to the actual
media in not that long.  The old "sync;sync;sync;sleep 10" thing from
before there was shutdown(8)...

I thought NCQ was supposed to give acks for actual writing, but allow
them to be perhaps ordered and multiple in flight, so that one could use
that instead of the big-hammer inscrutable writeback cache.

If the controller doesn't support NCQ, then it seems one has to issue a
cache flush, which presmably is defined to get all data in cache as of
the flush onto disk before reprorting that its done.


Is that what you're thinking, or do you think this is all about case 1?

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index