Subject: UFS+logging @ FreeBSD
To: None <tech-kern@NetBSD.org>
From: Hubert Feyrer <hubert@feyrer.de>
List: tech-kern
Date: 04/21/2005 20:13:15
It seems there's work underway for adding journaling to UFS, like Solaris 
have. To quote from the FreeBSD Status Report Jan-Mar 2005:

---snip---
Filesystem journalling for UFS

    URL:
    http://repoman.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/scot
    tl/ufsj

    Contact: Scott Long <scottl@freebsd.org>

    It's time to bite the bullet and admit that fsck is no longer scalable
    for modern storage capacities. While a healthy debate can still be had
    on the merits and data integrity guarantees of journalling vs.
    SoftUpdates, the fact that SoftUpdates still requires a fsck to ensure
    consistency of the filesystem metadata after an unclean shutdown means
    uptime is lost. While background fsck is available, it saps system
    performance and stretched the fsck time out to hours.

    Journalling provides a way to record transactions that might not have
    fully been written to disk before the system crashed, and then quickly
    recover the system back to a consistent state by replaying these
    transactions. It doesn't guarantee that no data will be lost, but it
    does guarantee that the filesystem will be back to a consistent state
    after the replay is performed. This contrasts to SoftUpdates that
    re-arranges metadata updates so that inconsistencies are minimized and
    easy to recover from, though recovery still requires the traditional
    full filesystem scan.

    Journalling is a key feature of many modern filesystems like NTFS,
    XFS, JFS, ReiserFS, and Ext3, so the ground is well covered and the
    risks for UFS/FFS are low. I'm aware that groups from CMU and RPI have
    attempted similar work in the past, but unfortunately the work is
    either very outdates, or I haven't had any luck in contacting the
    groups. Is this absence, I've decided to work on this project myself
    in hopes of having a functional prototype in time for FreeBSD 6.0.

    The approach is simple and journals full metadata blocks instead of
    just deltas or high-level operations. This greatly simplifies the
    replay code at the cost of requiring more disk space for the journal
    and more work within the filesystem to identify discreet update
    points. An important design consideration is whether to make the
    journal data and code compatible with the UFS2 filesystem, or to start
    a new UFS3 derivative. Since the latter presents a very high barrier
    to adoption for most people, I'm going to try to make it a compatible
    option for UFS2. This means that the journal blocks will likely appear
    as an unlinked file to legacy filesystem and fsck code, and will be
    treated as such. This will allow seamless fallback to using fsck,
    though once the unlinked journal data blocks are reclaimed by fsck,
    the user will have to take action to re-create the journal file again.

    One key piece of journalling is ensuring that each journal transaction
    is fully written to disk before the associated metadata blocks are
    written to the filesystem. I plan to adopt the buffer 'pinning'
    mechanism from Alexander Kabaev's XFS work to assist with this. This
    will allow the journalling subsystem fine-grained control over which
    blocks get flushed to disk by the buffer daemon without having to
    further complicate the UFS/FFS code. One consideration is how
    Softupdates falls into this and whether it is multually exclusive of
    journalling or if it can help provide transaction ordering
    functionality to the journal. Research here is on-going.

    Some preliminary work can be found in Perforce in the
    //depot/user/scottl/ufsj/... tree or at the URL provided. Hopefully
    this will quickly accelerate.
---snip---


  - Hubert