Subject: Re: [Fwd: Re: SOFTDEPS safe for qmail?]
To: None <witek@pd37.warszawa.sdi.tpnet.pl, mckusick@mckusick.com>
From: Robert Elz <kre@munnari.OZ.AU>
List: current-users
Date: 06/19/2000 03:05:35
    Date:        Sat, 17 Jun 2000 21:44:09 +0200 (CEST)
    From:        witek@pd37.warszawa.sdi.tpnet.pl
    Message-ID:  <XFMail.000617214409.witek@pd37.warszawa.sdi.tpnet.pl>

  | Standards - SUSv2:

[quotes deleted - thanks for those]

All of that makes sense, but ...

  | "all file system information required to retrieve the data" suggests
  | that both directory entry and all directories above should be updated.

not to me it doesn't - I would read that as meaning the inode, indirect
blocks, etc (logs for lfs) - that is, those parts of the filesystem that
are needed for the filesystem to operate correctly, but aren't part of
the visible user interface (the parts that different filesystems can
implement differently - and hence which standards need to describe in
general (hand waving) terms, like "all file system information...").

Ie: it isn't good enough for a filesystem to write out the data block
but not bother writing the indirect block that contains the pointer to
the data block...   This might seem so obvious that it isn't worth saying,
but when it comes to implementors attempting to win the "my filesystem is
faster than yours and implements the standards" battles, these things are
important to be made explicit.

On the other hand directory entries are so obvious, and such a front end
part of the filesystem interface, that if fsync() were intended to apply to
them, I would expect them to have been mentioned explicitly.  That they
weren't suggests to me that according to SUSv2, fsync() isn't guaranteed
to update the directory entries that refer to an inode, when a fd that
refers to the inode is fsync'd.

But regardless of this, I still think that the NetBSD man page ought
to be saying what NetBSD is guaranteeing about the semantics of fsync.
That is, that which an application can rely upon, regardless of the
filesystem type or mount options (with perhaps some caveats explicitly
stated - such as "mount -o async" will break this and that..)

If NetBSD with softdep behaves the way Kirk described it, then that
ought to be made clear.  I'm a little amazed myself, unless I
misunderstood the intent.   Eg: (I don't know qmail's directory
structure, so I will use sendmail for this example) - the directory
that counts for sendmail is /var/spool/mqueue (it is in there that
the rename's happen).  Kirk said "all names up the path to the root"
(or words to that effect).  I think that means that if /var or /var/spool
has been changed, they will be written to disc by a fsync of a file that
lives in /var/spool/mqueue.   But we know that for some people, /var
is just a symlink to /usr/var - so anything that chased the path back
to the root is going to find /usr/var/spool/mqueue, and never encounter
the /var symlink at all.  Further the path names sendmail uses are
just filenames - its current directory is /var/spool/mqueue (or
/usr/var/spool/mqueue) so when the open/close/rename/fsync on the
queue files are done, there's no reference to the path to the root
at all, just the current directory, and the relative filename.

Now for sendmail's purposes (and most probably qmail's as well) none
of this matters - all that matters is that the rename inside the
mqueue directory is completed (whatever path is used to reach there).
But in the general case, this need not be true.

Kirk, could you possibly fill us in again on just what it is that
fsync does guarantee with respect to names in directories, and the
contents of symlinks ?

And to be certain, if I go to /var/spool/mqueue, and (while sendmail
is playing about with making qfAAA12850 - just before that is
created by renaming the tfAAA12850 (or lf... whichever it is), I
do, manually (from the shell) "ln -s qfAAA12850 foobar" will the
fsync() that sendmail does after the rename() that creates qfAAA12850
guarantee that the "foobar" symlink has been written to disc before
ot returns (given that "foobar" is one of the names of the file that
has just been fsync'd).

In return, if I reach an understanding of what is going on, I will
attempt to submit a PR for fsync(2) with (if I can manage the current
man macros) a patch to update it for completeness (and if I can't
manage the macros, at least the text that should go there).

kre