tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Removing softdep



On Tue, Jun 10, 2008 at 9:33 AM, Greg A. Woods; Planix, Inc.
<woods%planix.ca@localhost> wrote:
>
> On 10-Jun-08, at 4:09 AM, Vincent wrote:
>>
>> Sort of. Let's say there could be two levels of reliability: the first
>> would
>> still enable copy-on-write, but write block and data synchronously,
>> beginning by
>> the latter, so that what could happen at worst would be a loss of data,
>> but no
>> file corruption or exposition of sensitive data.  A second level would
>> bypass the
>> copy-on-write and implement write-through, so that no data would be lost,
>> or a
>> minimal amount.
>
> It's not quite that simple as far as I understand.  I'm also not so sure
> that 'mount -o sync' isn't already almost as good as you suggest.  I think
> the 'sync' flag on FFS only avoids the buffer cache for writes thus reducing
> the amount of data loss/corruption (and exposure) to just the last block(s)
> being written to the file(s) being written to at the time of the crash.
>  Many good safety conscious applications already do that without avoiding
> the basic benefits of the buffer cache by writing new data to temporary
> files and then doing and fsync() before closing and finally renaming them
> into place.  It's the unix way.  :-)
>
> I think you can only prevent corruption or exposure at the FS layer if you
> go one step further.  You have to write all the FS metadata carefully (i.e.
> in the right order such that a repair tool can clean up any incomplete
> updates or inconsistencies, but you have to mark the block list as allocated
> and pending, then you have to write the data to those blocks, and finally
> after every block write is finished you have to update the block list to say
> that the just written block is now up-to-date and containing "valid" data.
>  I.e. add another map, or flags to the block list, or something such that
> they can be separately allocated and then marked as valid; thus in effect
> replicating at the block level what an application does by using temporary
> files and fsync();rename().
>
> That's going to be terribly slow on any mechanical rotating storage device
> without a write-back cache somewhere below in the hardware layer, and just
> as unreliable with a write-back cache if you can't guarantee it will get
> safely flushed before the hardware is reset somehow.
>

<OT>
It's funny that you mention mechanical rotating disks because I was
theorizing that the new SSD disks would not show a full benefit until
filesystem technologies were updated with them in mind.
</OT>


Home | Main Index | Thread Index | Old Index