NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: state or future of LFS?



On Apr 12, 2009, at 12:40 AM, Miles Nordin wrote:

   cs>       F_FULLFSYNC Does the same thing as fsync(2) then asks
   cs> the drive to flush all buffered data

yeah well, whatever.

Yeah, well, relevant facts don't care about opinions: whether you acknowledge them or wish to ignore them is up to you.

This is what I was talking about:

http://oss.sgi.com/archives/xfs/2005-02/msg00395.html

which says that you have to use some special OS X only API to achieve
an fsync that's functionally equivalent to what every other Unix
normally gives you.  I don't think it's fair to mention all the other
detail without mentioning this well-known problem.

That's absolutely right-- only, I mentioned the special API (that's the F_FULLFSYNC bit you quoted above).

As far as I can tell (ie, from looking at the code), by default OSX does synchronous updates to data and async updates to filesystem metadata because it trusts the journaling mechanism to keep the metadata consistent-- and this is well-documented, along with what an app like a database should do to obtain ACID semantics from the filesystem.

But let's focus on just what other Unices do with fsync():

and what every other Unix normally gives you is not really so thorough
as one might like (may incl. I think NetBSD? does not propogate SYNC
CACHE command all the way to the disk (ZFS does), or discards said
disk commands in the software RAID layer (Linux LVM2), iSCSI
correctness problems, u.s.w.), but is still more useful than what OS X
gives without the special option.

Whether the data actually gets written and the on-disk cache itself flushed seems to depend on a sysctl called hw.ata.wc for FreeBSD or the dkctl setting in NetBSD; write-caching seems to always default to on because otherwise people scream bloody murder about the factor of ten reduction in write performance with it off. Further, by default (ie, FFSv2 with soft updates), data changes are synced out when you do an fsync(), but metadata changes are done asynchronously-- which is exactly what OSX does.

I'm sure their API circus made them look great in filebench or bonnie
or fsstress or whatever benchmarks don't know about their special API,
though.  It's thoroughly bullshit, IMO.

Be careful of throwing stones-- from the authoritative source:

http://www.usenix.org/publications/library/proceedings/usenix2000/general/full_papers/seltzer/seltzer_html/index.html

"Both journaling and Soft Updates systems ensure the integrity of meta- data operations, but they provide slightly different semantics. The four areas of difference are the durability of meta-data operations such as create and delete, the status of the file system after a reboot and recovery, the guarantees made about the data in files after recovery, and the ability to provide atomicity.

The original FFS implemented meta-data operations such as create, delete, and rename synchronously, guaranteeing that when the system call returned, the meta-data changes were persistent. Some FFS variants (e.g., Solaris) made deletes asynchronous and other variants (e.g., SVR4) made create and rename asynchronous. However, on FreeBSD, FFS does guarantee that create, delete, and rename operations are synchronous. FFS-async makes no such guarantees, and furthermore does not guarantee that the resulting file system can be recovered (via fsck) to a consistent state after failure. Thus, instead of being a viable candidate for a production file system, FFS-async provides an upper bound on the performance one can expect to achieve with the FFS derivatives.

Soft Updates provides looser guarantees than FFS about when meta-data changes reach disk. Create, delete, and rename operations typically reach disk within 45 seconds of the corresponding system call, but can be delayed up to 90 seconds in certain boundary cases (a newly created file in a hierarchy of newly created directories)."

--
-Chuck



Home | Main Index | Thread Index | Old Index