Subject: Re: FFS journal
To: Kirill Kuvaldin <kirill.kuvaldin@gmail.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 07/03/2006 21:08:13
On Sun, Jul 02, 2006 at 07:59:50PM +0400, Kirill Kuvaldin wrote:
> [...]
> Optional:
> 
> * Support for batching transactions:
>   - it may be a significant performance win.

I fear that not batching transactions may have a significant impact on
performances, especially when compared to a softdep FFS.

> * API Documentation:
>   - it may be helpful for the developers to understand what the
>     journaling code does and how to use it.

I don't think this can be optional.

> 
> III. TECHNICAL DETAILS
> 
> * Journal internals:
>   - The journal area (or log area) used to write journal entries is a
>     fixed data allocated at filesystem initialization. The filesystem
>     superblock  must maintain a reference to the journal area which also
>     contains its own superblock where some sort of necessary information
>     is stored. Two indices (start_index and end_index) that point to the
>     start and end of the active area of the journal that is used in
>     circular fashion, simply mark the bounds of the journal that contain
>     active transactions.
> 
> +-------------+------------------------+------------------------+------+
> |  Journal    |                        |                        |      |
> | superblock  | t r a n s a c t i o n  | t r a n s a c t i o n  |      |
> |+-----------+|+-------+ +-------+     |+-------+ +-------+     |      |
> ||start_index|||       | |       |     ||       | |       |     |      |
> ||end_index  |||       | |       | ... ||       | |       | ... | .... |
> || ...       |||       | |       |     ||       | |       |     |      |
> |+-----------+|+-------+ +-------+     |+-------+ +-------+     |      |
> +-------------+------------------------+------------------------+------+
>  Figure 1: Journal area on-disk representation

Shouln't this have some constraints with disk sector bountaries ?
Note it's a shoot in the dark, I've just been thinking about this when seeing
this figure ...

> 
> * Journaling API:
> The following ideas inspired from the BeFS textbook (see [2]). Although,
> there are only 3 functions for journal management, it may be enough for
> the rest part of filesystem to interact with journaling code.
> 
> - jffs_start_transaction():
>   o acquire the journal semaphore, holding it under the transactions
>     completes;
>   o ensure that there is enough space available in the journal to hold
>     this transaction and in case there is - make some preparation
>     actions and allocate the necessary transaction structures;
>     otherwise, to force flushing blocks out of the cache, preferably
>     those that were part of previous transactions;
>   o set the state of transaction to *running* allowing the filesystem
>     code to add new blocks to form the transaction structure.
> 
> - jffs_write_blocks():
>   o during a transaction any code that modifies a block of the
>     filesystem metadata must call this function on the modified data;
>   o for the sake of performance it may be possible to modify only the
>     in-memory journal structures and later flush them to the log.
> 
> - jffs_end_transaction():
>   o at first this function turns a transaction into the *locked* state,
>     meaning that no more block can be added to the transaction;
>   o write all in-memory transaction blocks to their appropriate places
>     into the journal area. When the last block is written to the
>     journal, the transaction is considered to be *finished*;
>   o set the callback function that will change the transaction state to
>     *completed* as soon as the journal entry will be completely flushed
>     to disk;
>   o release the journal semaphore.
> 
> * Journaling constraints on the cache subsystem:
>   - journaling code must be able to lock disk blocks in the cache to
>     prevent them from being flushed.
>   - journaling code must know when a disk block is flushed to disk. It
>     may be achived with callback functions if cache subsystem supports
>     them. When the last block forming the transaction is flushed to
>     disk, the transaction considered to be completed.

To me it looks like this needs to hook into the softdep code. Softdep
has more or less the same constraints.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--