tech-kern: Re: Log area on-disk for the journal

Subject: Re: Log area on-disk for the journal
To: M J Fleming <mjf@NetBSD.org>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 10/20/2006 13:37:05
On Oct 20, 2006, at 12:38 PM, M J Fleming wrote:

>> The location of the journal itself has several design issues
>> to consider, such as:
>>   . possibly locating the journal on separate media for performance.
>>     For example, a separate spindle or fast nvram may sometimes be
>>     desired.
>
> How popular is this in journalled file systems? I think old versions  
> of solaris
> allowed this, but since version 7, I think the log has been embedded  
> in
> the filesystem.

For "local" file systems, this is fairly unusual.  For file systems  
like UFS, HFS+, EXTn, JFS, etc. the journal is usually kept along with  
the other bits of the file system.

That said, there are some file systems that support journal (and other  
metadata, e.g. inodes and directories) on separate media from file  
data.  Apple's Xsan file system is one such example.  However, I would  
say that for UFS, keeping the journal on the same media object is the  
best course of action.

>>   . finding the journal when mounting or fsck'ing.  This can be  
>> especially
>>     complicated if the journal is on separate media and the machine  
>> gets
>>     reconfigured between boots.
>
> Yeah, this worries me too.

Yah, unless you have some other identifier...

>>   . contiguous allocation of the journal.
>>   . the relative seek distance of the journal to the data it contains
>
> How about a log area for every cylinder group? Would this be feasible?
> I suppose you'd then have to have some trickery to find out which  
> log you're
> going to write to and if the blocks are spread over multiple cgs,  
> then it's
> gonna be a real pain.

You still have to go somewhere else to figure out where your head and  
tail pointers are.

Wasabi's journaling extensions for UFS actually use a contiguous area  
AFTER the file system (but within the same PARTITION as the file  
system).  This was mostly an expedience thing, per my recollection.

Really, you have two good choices:

- Contiguously allocated from within the file system's free space.   
Pros: simple, fast.  Cons: hard to enable journaling on a pre-existing  
file system if the free space is heavily fragmented.

- Allocated from file system free space as contiguously as possible,  
but no different than any other file.  Pros: easy to enable journaling  
on pre-existing file systems because it will work regardless of how  
the free space is laid out.  Cons: have to use BMAP to translate  
journal offset -> location on disk.

Also, IMO, the journal should be hooked up to a reserved inode.

>>   . filesystem consistency if the system crashes during journal  
>> creation.

This is trivial if you implement the journal as a special kind of  
file, and then enable journaling to that file.  Existing consistency  
mechanisms do the work for you.

>>   . compatibility/upgrade issues, such as whether the accessing
>>     filesystem code has to be journal aware, even if the filesystem
>>     was cleanly unmounted.

Journaled UFS should get a new magic number.  Old tools will refuse to  
talk to one in that case.

>>   . whether to clutter the filesystem namespace with the journal
>>
>> Your idea to place it in cg0 is probably not a terrible one.
>>
>> In a first implementation, I put the journal in the same partition,
>> but after the filesystem.  This made implementation easier, although
>> I long intended to place the journal in the filesystem instead.
>>
>> I recommend placing the journal data in the filesystem in a file
>> linked in as /.journal or something.  It can still be allocated
>> contiguously if desired, although accessing it can be complicated by
>> directory lookups and bmap.
>
> That seems like a fine idea, not one I'd thought of.
>
>>
>> Darrin
>
> Thanks,
> Matt

-- thorpej