NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: state or future of LFS?



On Apr 12, 2009, at 12:40 AM, Miles Nordin wrote:
   cs>       F_FULLFSYNC Does the same thing as fsync(2) then asks
   cs> the drive to flush all buffered data

yeah well, whatever.
Yeah, well, relevant facts don't care about opinions: whether you  
acknowledge them or wish to ignore them is up to you.
This is what I was talking about:

http://oss.sgi.com/archives/xfs/2005-02/msg00395.html

which says that you have to use some special OS X only API to achieve
an fsync that's functionally equivalent to what every other Unix
normally gives you.  I don't think it's fair to mention all the other
detail without mentioning this well-known problem.
That's absolutely right-- only, I mentioned the special API (that's  
the F_FULLFSYNC bit you quoted above).
As far as I can tell (ie, from looking at the code), by default OSX  
does synchronous updates to data and async updates to filesystem  
metadata because it trusts the journaling mechanism to keep the  
metadata consistent-- and this is well-documented, along with what an  
app like a database should do to obtain ACID semantics from the  
filesystem.
But let's focus on just what other Unices do with fsync():

and what every other Unix normally gives you is not really so thorough
as one might like (may incl. I think NetBSD? does not propogate SYNC
CACHE command all the way to the disk (ZFS does), or discards said
disk commands in the software RAID layer (Linux LVM2), iSCSI
correctness problems, u.s.w.), but is still more useful than what OS X
gives without the special option.
Whether the data actually gets written and the on-disk cache itself  
flushed seems to depend on a sysctl called hw.ata.wc for FreeBSD or  
the dkctl setting in NetBSD; write-caching seems to always default to  
on because otherwise people scream bloody murder about the factor of  
ten reduction in write performance with it off.  Further, by default  
(ie, FFSv2 with soft updates), data changes are synced out when you do  
an fsync(), but metadata changes are done asynchronously-- which is  
exactly what OSX does.
I'm sure their API circus made them look great in filebench or bonnie
or fsstress or whatever benchmarks don't know about their special API,
though.  It's thoroughly bullshit, IMO.
Be careful of throwing stones-- from the authoritative source:

http://www.usenix.org/publications/library/proceedings/usenix2000/general/full_papers/seltzer/seltzer_html/index.html

"Both journaling and Soft Updates systems ensure the integrity of meta- data operations, but they provide slightly different semantics. The four areas of difference are the durability of meta-data operations such as create and delete, the status of the file system after a reboot and recovery, the guarantees made about the data in files after recovery, and the ability to provide atomicity.
The original FFS implemented meta-data operations such as create,  
delete, and rename synchronously, guaranteeing that when the system  
call returned, the meta-data changes were persistent. Some FFS  
variants (e.g., Solaris) made deletes asynchronous and other variants  
(e.g., SVR4) made create and rename asynchronous. However, on FreeBSD,  
FFS does guarantee that create, delete, and rename operations are  
synchronous.
FFS-async makes no such guarantees, and furthermore does not guarantee  
that the resulting file system can be recovered (via fsck) to a  
consistent state after failure. Thus, instead of being a viable  
candidate for a production file system, FFS-async provides an upper  
bound on the performance one can expect to achieve with the FFS  
derivatives.
Soft Updates provides looser guarantees than FFS about when meta-data  
changes reach disk. Create, delete, and rename operations typically  
reach disk within 45 seconds of the corresponding system call, but can  
be delayed up to 90 seconds in certain boundary cases (a newly created  
file in a hierarchy of newly created directories)."
--
-Chuck



Home | Main Index | Thread Index | Old Index