Subject: Re: FFS and Journaling
To: Simon Truss <simon@bigblue.demon.co.uk>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: netbsd-users
Date: 03/27/2007 16:48:31
--z4+8/lEcDcG5Ke9S
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Mar 25, 2007 at 06:49:35PM +0000, Simon Truss wrote:
> Thor Lancelot Simon wrote:
> >On Sun, Mar 25, 2007 at 10:24:34AM +0000, Simon Truss wrote:
> >>Hi,
> >>
> >>From the recent SOC discussion I gather that the advantage of=20
> >>journalling over softdep is the fast replay of the journal obviating th=
e=20
> >>need to fsck the whole disk. My thinking here is that softdep and=20
> >>journalling has been kept as independent strategies and that a combined=
=20
> >>softdep+journal may prove slightly simpler and more efficient than a=20
> >>full blown independent journal solution.
> >
> >I think it's about the same amount of work.  You're right, as far as I
> >can tell, that it should be possible to use softdep to order the writes
> >for the filesystem, then output those ordered writes into a journal.  It
> >would be nice if softdep in fact were a generic layer that produced an
> >ordered graph of the writes for a chunk of the filesystem namespace.
>=20
> Are we saying then that there is a commonality in required fs hooks=20
> between journalling and softdep. That by developing journalling and some=
=20
> clean dependency hooks there might come a time when softdep can be moved=
=20
> over to this common interface? That would be wonderful, both progress=20
> and practical.

That won't work.

The idea behind jopurnaling in this context is journaling as used in data=
=20
bases. Either an operation on the fs metadata has happened or it hasn't; a=
=20
change is atomic. Thus you have either created a file or you haven't. You=
=20
have deleted a file or you haven't.

To achieve that, you have to perform two sets of writes. You first write=20
an operation into the journal. If something happens before that write=20
completes, the event hasn't happened. Once the write's in the journal, the=
=20
event has happened, and so we write all the blocks onto the disk. We don't=
=20
delete the entry from the journal (we don't let the journal over write it)=
=20
until all the other writes are done.

One consequence is that the write ordering is "write the journal" "write=20
everything else". So the ordering model is very different from the normal=
=20
one. Soft updates doesn't change the write ordering model, it just makes=20
it much more asynchronous compared to the caller.

So mixing them won't really work.

Note that while a journal can help with a lot, there are things it can't=20
do. For instance, if you have a bug in a driver or in the fs code itself,=
=20
the journal won't save you. Likewise a double fault on a RAID 5 is deadly.=
=20
Also, if caching is turned on on a disk and we don't have something akin=20
to FUA (Force Unit Access) (or if we assert FUA and the device lies to us=
=20
about completion), then we can have a huge safety issue; we depend on=20
being able to know that the journal write has finished before moving on.

> I have just about reached by limit of my file system knowledge, should I=
=20
> continue into detail with this concept I will be proceeding to fantasy=20
> systems instead :-)
>=20
> >But it's not.  Instead, it's a huge mess of spaghetti code that is
> >incestuous with all layers of the FFS implementation at every opportunit=
y.
>=20
> Yikes. That's a good reason not to mess with softdep. Practicality=20
> always gets in the way.
>=20
> >From my point of view one major advantage of a journalled FFS would be
> >the simplicity it would bring back to our core filesystem implementation.
> >I would be very happy to get good filesystem performance from kernels
> >that didn't have the softdep code anywhere near them.
>=20
> agreed, but softdep in concept still has a lot of potential, some of=20
> which cannot be offered by journalling. I would hope that if softdep was=
=20
> removed that something could be found to take its place some day.

Other than the fact that the need for writes can disapear (consider=20
creation of a temporary file that then gets deleted before the update=20
flushes to disk), what potential did you have in mind?

Take care,

Bill

--z4+8/lEcDcG5Ke9S
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGCbtfWz+3JHUci9cRAmbKAJ0fp2r19+7YWEp771/Fd89fIslpgQCglUef
JR51ELCdTOSpt8dGj376Cs8=
=4WbJ
-----END PGP SIGNATURE-----

--z4+8/lEcDcG5Ke9S--