Subject: Re: SOFTDEPS safe for qmail?
To: Robert Elz <kre@munnari.OZ.AU>
From: Don Lewis <Don.Lewis@tsc.tdk.com>
List: current-users
Date: 06/17/2000 04:06:40
On Jun 17,  5:45am, Robert Elz wrote:
} Subject: Re: SOFTDEPS safe for qmail?
}     Date:        Fri, 16 Jun 2000 09:30:06 -0400
}     From:        Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
}     Message-ID:  <200006161330.NAA21374@orchard.arlington.ma.us>
} 
}   | What recent versions of sendmail do is:
}   | 
}   | 	write message to file
}   | 	fsync file
}   | 	rename file (to indicate that the file is a complete message)
}   | 	fsync file
}   | 
}   | If you add the second fsync to force the rename out to disk, you
}   | should be all set..
} 
} Is that really guaranteed?

It is guaranteed by softdep.  When you fsync() a file whose directory
entry has not been pushed to disk, the softdep code writes the directory
entry before fsync() returns.

} rename() is an operation on the directory,
} not the file - the only operation it performs on the file (the inode
} of the file) is to update the inode changed time (and that really only
} for historical reasons).   I can't think of any particular reason that a
} filesystem which is attempting to maximise effeciency, while retaining
} internal consistency, would care much when the inode change time update
} was done with respect to the directory changes that are going on.  If
} the inode is flushed before the rename finishes (before the updated
} directory is flushed) then the problem would still be there.

Offhand I don't know if softdep enforces any sort of ordering between
the ctime update and the directory change, but I don't see any problem
if these are done in either order.

In either traditional UFS or softdep the inode must be written before
the directory entry for newly created files, and the inode must be written
after the directory entry when a file is unlinked.  If these are done
in the wrong order the filesystem will be in an inconsistent state which
will require an fsck if the system crashes in the middle.

} If the rename was being done by a link/unlink combination, which was
} actually altering the link count in the inode, then I'd tend to trust
} it more (as the inode count can't be decremented after the unlink
} until after the directory has actually been updated).

If the old and new file names are the same length, I believe rename()
will reuse the directory slot, which should be an atomic change.

} That being said, I'd be a little surprised if Kirk hadn't considered
} the needs of sendmail in the design of all of this...
} 
} kre
}-- End of excerpt from Robert Elz