tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Removing softdep

On 10-Jun-08, at 4:09 AM, Vincent wrote:

Sort of. Let's say there could be two levels of reliability: the first would still enable copy-on-write, but write block and data synchronously, beginning by the latter, so that what could happen at worst would be a loss of data, but no file corruption or exposition of sensitive data. A second level would bypass the copy-on-write and implement write-through, so that no data would be lost, or a
minimal amount.

It's not quite that simple as far as I understand. I'm also not so sure that 'mount -o sync' isn't already almost as good as you suggest. I think the 'sync' flag on FFS only avoids the buffer cache for writes thus reducing the amount of data loss/corruption (and exposure) to just the last block(s) being written to the file(s) being written to at the time of the crash. Many good safety conscious applications already do that without avoiding the basic benefits of the buffer cache by writing new data to temporary files and then doing and fsync() before closing and finally renaming them into place. It's the unix way. :-)

I think you can only prevent corruption or exposure at the FS layer if you go one step further. You have to write all the FS metadata carefully (i.e. in the right order such that a repair tool can clean up any incomplete updates or inconsistencies, but you have to mark the block list as allocated and pending, then you have to write the data to those blocks, and finally after every block write is finished you have to update the block list to say that the just written block is now up-to-date and containing "valid" data. I.e. add another map, or flags to the block list, or something such that they can be separately allocated and then marked as valid; thus in effect replicating at the block level what an application does by using temporary files and fsync();rename().

That's going to be terribly slow on any mechanical rotating storage device without a write-back cache somewhere below in the hardware layer, and just as unreliable with a write-back cache if you can't guarantee it will get safely flushed before the hardware is reset somehow.

At least conceptually a journalling style of FS can give you all of that reliability and integrity all of the time and as a bonus you get some decent performance along with it too. A good journalling FS shouldn't need a full fsck after any crash either.

                                        Greg A. Woods; Planix, Inc.

Home | Main Index | Thread Index | Old Index