Subject: Re: Log area on-disk for the journal
To: M J Fleming <mjf@NetBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/23/2006 19:44:53
--/8E7gjuj425jZz9t
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Oct 20, 2006 at 08:38:58PM +0100, M J Fleming wrote:
> I am bringing this up on a public forum to solicit feedback from a wider
> range of people.

Be careful with this. You'll get a lot of different suggestions all going=
=20
in different directions. :-)

> On Fri, Oct 20, 2006 at 12:02:26PM -0700, Darrin B.Jewell wrote:
> > In any case, there are several design issues about the log location
> > and layout to consider.
> > > Am I correct in thinking that the layout for FFS on-disk is
> > >=20
> > > [disklabel/bootstrap] [superblock]   [cg0] [cg1] [cg2] ... [cgN]
> > >=20
> > > So I would have the log area here ^^ between the superblock and first
> > > cylinder group. So, I've been looking through the code for newfs and =
was
> > > going to create a space JOURNALSIZE big. Should this be proportional =
to
> > > the blocksize, or is it better to have predefined?
> >=20
> > Yes, this is basically correct.  However, keep in mind that the
> > disk is evenly divided into cylinder group areas.  The cylinder
> > group header containing accounting information is not at the
> > start of each cylinder group.  This is so the math for accessing
> > cylinder groups does not have special case cg0 or handle an offset
> > even though the superblock and bootstrap is only in cg0.  By
> > increasing the area used in cg0, you will move the cg header
> > further into the cylinder group which may increase fragmentation
> > issues caused by the data area before the cg header in each cg.
> >=20
> > To answer your first question, being proportional to the blocksize
> > is probably a fine answer, although most of the data in it will
> > actually be stored in fragment sized chunks.
> >=20
> > The location of the journal itself has several design issues
> > to consider, such as:
> >   . possibly locating the journal on separate media for performance.
> >     For example, a separate spindle or fast nvram may sometimes be
> >     desired.
>=20
> How popular is this in journalled file systems? I think old versions of s=
olaris
> allowed this, but since version 7, I think the log has been embedded in
> the filesystem.

AIX kinda does this, or used to. In LVM disks, you had a journal per disk=
=20
or per LVM group. Thus all file systems on a spindle used the same=20
journal.

> >   . finding the journal when mounting or fsck'ing.  This can be especia=
lly
> >     complicated if the journal is on separate media and the machine gets
> >     reconfigured between boots.
>=20
> Yeah, this worries me too.
>=20
> >   . contiguous allocation of the journal.
> >   . the relative seek distance of the journal to the data it contains
>=20
> How about a log area for every cylinder group? Would this be feasible?
> I suppose you'd then have to have some trickery to find out which log you=
're
> going to write to and if the blocks are spread over multiple cgs, then it=
's
> gonna be a real pain.

No, you should have one journal per fs. There are lots of consistency=20
issues if you have multiple journals.

> >   . accessing the start of the disk is usually faster
> >   . adding the journal to an existing filesystem without reformatting.
>=20
> Without using a seperate device, how would this work?
>=20
> >   . filesystem consistency if the system crashes during journal creatio=
n.
> >   . compatibility/upgrade issues, such as whether the accessing
> >     filesystem code has to be journal aware, even if the filesystem
> >     was cleanly unmounted.
> >   . whether to clutter the filesystem namespace with the journal
> >=20
> > Your idea to place it in cg0 is probably not a terrible one.
> >=20
> > In a first implementation, I put the journal in the same partition,
> > but after the filesystem.  This made implementation easier, although
> > I long intended to place the journal in the filesystem instead.
> >=20
> > I recommend placing the journal data in the filesystem in a file
> > linked in as /.journal or something.  It can still be allocated
> > contiguously if desired, although accessing it can be complicated by
> > directory lookups and bmap.
>=20
> That seems like a fine idea, not one I'd thought of.

Yeah, start here.

To be honest, I think there may be a number of different ways to handle
the journal. Also, exactly how the journal is layed out can (as this
thread indicates) raise a lot of discussion. So whatever we decide now
isn't necessarily how it will stay (though using a file is a great idea).
Just try to partition the actual writing of the journal code, and if we
want to revisit this later, we can.

And leaving in the idea of revisiting later means we don't have to get the=
=20
layout perfect now. Which I think is good as I suspect most of the work is=
=20
elsewhere.

Take care,

Bill

--/8E7gjuj425jZz9t
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFFPX4lWz+3JHUci9cRAr1dAJ9TM6Is+YBbpzdP3qoFREXwl6zMXwCeMnNr
aI5mPrxipKSPBwfVzqN9IUI=
=xPTT
-----END PGP SIGNATURE-----

--/8E7gjuj425jZz9t--