tech-kern: Re: Log area on-disk for the journal

Subject: Re: Log area on-disk for the journal
To: None <tech-kern@NetBSD.org>
From: M J Fleming <mjf@NetBSD.org>
List: tech-kern
Date: 10/20/2006 20:38:58
I am bringing this up on a public forum to solicit feedback from a wider
range of people.

On Fri, Oct 20, 2006 at 12:02:26PM -0700, Darrin B.Jewell wrote:
> In any case, there are several design issues about the log location
> and layout to consider.
> > 
> > Am I correct in thinking that the layout for FFS on-disk is
> > 
> > [disklabel/bootstrap] [superblock]   [cg0] [cg1] [cg2] ... [cgN]
> > 
> > So I would have the log area here ^^ between the superblock and first
> > cylinder group. So, I've been looking through the code for newfs and was
> > going to create a space JOURNALSIZE big. Should this be proportional to
> > the blocksize, or is it better to have predefined?
> 
> Yes, this is basically correct.  However, keep in mind that the
> disk is evenly divided into cylinder group areas.  The cylinder
> group header containing accounting information is not at the
> start of each cylinder group.  This is so the math for accessing
> cylinder groups does not have special case cg0 or handle an offset
> even though the superblock and bootstrap is only in cg0.  By
> increasing the area used in cg0, you will move the cg header
> further into the cylinder group which may increase fragmentation
> issues caused by the data area before the cg header in each cg.
> 
> To answer your first question, being proportional to the blocksize
> is probably a fine answer, although most of the data in it will
> actually be stored in fragment sized chunks.
> 
> The location of the journal itself has several design issues
> to consider, such as:
>   . possibly locating the journal on separate media for performance.
>     For example, a separate spindle or fast nvram may sometimes be
>     desired.

How popular is this in journalled file systems? I think old versions of solaris
allowed this, but since version 7, I think the log has been embedded in
the filesystem.

>   . finding the journal when mounting or fsck'ing.  This can be especially
>     complicated if the journal is on separate media and the machine gets
>     reconfigured between boots.

Yeah, this worries me too.

>   . contiguous allocation of the journal.
>   . the relative seek distance of the journal to the data it contains

How about a log area for every cylinder group? Would this be feasible?
I suppose you'd then have to have some trickery to find out which log you're
going to write to and if the blocks are spread over multiple cgs, then it's
gonna be a real pain.

>   . accessing the start of the disk is usually faster
>   . adding the journal to an existing filesystem without reformatting.

Without using a seperate device, how would this work?

>   . filesystem consistency if the system crashes during journal creation.
>   . compatibility/upgrade issues, such as whether the accessing
>     filesystem code has to be journal aware, even if the filesystem
>     was cleanly unmounted.
>   . whether to clutter the filesystem namespace with the journal
> 
> Your idea to place it in cg0 is probably not a terrible one.
> 
> In a first implementation, I put the journal in the same partition,
> but after the filesystem.  This made implementation easier, although
> I long intended to place the journal in the filesystem instead.
> 
> I recommend placing the journal data in the filesystem in a file
> linked in as /.journal or something.  It can still be allocated
> contiguously if desired, although accessing it can be complicated by
> directory lookups and bmap.

That seems like a fine idea, not one I'd thought of.

> 
> Darrin

Thanks,
Matt