Subject: Re: FFS journal
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/05/2006 17:07:15
--GV0iVqYguTV4Q9ER
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 05, 2006 at 11:01:27PM +0200, Manuel Bouyer wrote:
> On Wed, Jul 05, 2006 at 01:31:19PM -0700, Bill Studenmund wrote:
> >=20
> > Uhm, the idea of a journal is that you no longer have to order MD write=
s.=20
> > As I understand it, that is the _point_. To retain ordering when we hav=
e a=20
> > journal is defeating the purpose.
>=20
> Some more though on this: I don't think there can be much performance
> difference between a async journaled FFS and a softdeps FFS. And I suspect
> it may be easier to add journaling to a softdep FFS, as softdep is a cent=
ral
> point where we have all the metadata writes.

You will reduce performance.

The runtime performance increase of journaling is that you write a=20
transaction to a small area of disk storage. You then write stuff all over=
=20
the disk, and let the disk optimize those writes.

With maintianing the write sequencing, we:

1) prevent the disk from optimizing all writes, since they aren't all=20
present at the same time.

2) Adding more complexity and doing more work than needed. You're talking=
=20
about breaking what should be one transaction into a number of them, say=20
three or five. Also, as ffs overwrites the same area a few times in some=20
operations, so we will have more writes than we would if we didn't=20
sequence. We also need to retain all of the complexity of softdeps, which=
=20
isn't usually needed with journaling.

We also will reduce journal performance. The journal is usually=20
implemented as an infinite buffer windowed into 64MB or 256 MB or=20
whatever. We can't have more than 64 or 256 MB outstanding at once.

Normally we write to the journal, then mark a transaction done when all of
the included writes are done. Since they happen in parallel, they happen
quickly, and we release the transaction. With MD ordering, we have to keep=
=20
a transaction open longer as more i/o has to happen, and it has to happen=
=20
sequentially.


We aren't the first journaling implementation to confront the issue of
what happens if we lose the journal at the moment when we have a
transaction that's there but hasn't been fully written, and we also lose=20
part of the commits. What do other OSs do?


I really think the best way to protect the fs from journal issues is to=20
protect the journal. If we really are that concerned, write it to RAID. Or=
=20
have two of them in the disk, and do a multi-part commit.


Regardless of what we do, there will always be a way to kill the file=20
system. So we should do a risk assessment. I have the feeling that you=20
have observed a vulnerability, and assumed it is frequent enough that it=20
warrents protection using a very complicated ordering scheme.

Take care,

Bill

--GV0iVqYguTV4Q9ER
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFErFQzWz+3JHUci9cRAmsEAKCKfjsXYCn7JLFlcFT5rPrbhMlCTwCeOIb/
ogqqHmRlUK5Z2NocGtFsD+w=
=Ah6V
-----END PGP SIGNATURE-----

--GV0iVqYguTV4Q9ER--