Subject: Re: FFS and Journaling
To: Simon Truss <simon@bigblue.demon.co.uk>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: netbsd-users
Date: 03/25/2007 12:36:58
On Sun, Mar 25, 2007 at 10:24:34AM +0000, Simon Truss wrote:
> Hi,
> 
> From the recent SOC discussion I gather that the advantage of 
> journalling over softdep is the fast replay of the journal obviating the 
> need to fsck the whole disk. My thinking here is that softdep and 
> journalling has been kept as independent strategies and that a combined 
> softdep+journal may prove slightly simpler and more efficient than a 
> full blown independent journal solution.

I think it's about the same amount of work.  You're right, as far as I
can tell, that it should be possible to use softdep to order the writes
for the filesystem, then output those ordered writes into a journal.  It
would be nice if softdep in fact were a generic layer that produced an
ordered graph of the writes for a chunk of the filesystem namespace.

But it's not.  Instead, it's a huge mess of spaghetti code that is
incestuous with all layers of the FFS implementation at every opportunity.
This has more to do with how it was implemented (as code for a strange
version of FFS in SVR4MP, designed to touch the rest of the SVR4MP kernel
as little as possible, then abstracted to pseudocode, then filled-back-in
for the FreeBSD kernel) than with necessity.  But it is still the case.

The result is that changing the softdep code _at all_ bears significant
risk of instability throughout the system, as does changing the code of
kernel subsystems it relies on -- look at the problems that appeared
with softdep swamping the I/O system when we introduced a variable-sized
metadata cache, allowing softdep to delay more writes.  And in practice,
because softdep generates an unpredictable stream of I/O requests that
is seek-heavy (delaying metadata writes and their associated file data
writes such that the disk head won't be near one when it's time to write
the other) it can swamp the disk, then leave it idle, then swamp the
disk, then leave it idle -- leading to terrible performance in real-world
situations.  Adding a journal could fix _that_ at least -- but at what
cost?

From my point of view one major advantage of a journalled FFS would be
the simplicity it would bring back to our core filesystem implementation.
I would be very happy to get good filesystem performance from kernels
that didn't have the softdep code anywhere near them.

Thor