Subject: UFS+logging @ FreeBSD
To: None <tech-kern@NetBSD.org>
From: Hubert Feyrer <firstname.lastname@example.org>
Date: 04/21/2005 20:13:15
It seems there's work underway for adding journaling to UFS, like Solaris
have. To quote from the FreeBSD Status Report Jan-Mar 2005:
Filesystem journalling for UFS
Contact: Scott Long <email@example.com>
It's time to bite the bullet and admit that fsck is no longer scalable
for modern storage capacities. While a healthy debate can still be had
on the merits and data integrity guarantees of journalling vs.
SoftUpdates, the fact that SoftUpdates still requires a fsck to ensure
consistency of the filesystem metadata after an unclean shutdown means
uptime is lost. While background fsck is available, it saps system
performance and stretched the fsck time out to hours.
Journalling provides a way to record transactions that might not have
fully been written to disk before the system crashed, and then quickly
recover the system back to a consistent state by replaying these
transactions. It doesn't guarantee that no data will be lost, but it
does guarantee that the filesystem will be back to a consistent state
after the replay is performed. This contrasts to SoftUpdates that
re-arranges metadata updates so that inconsistencies are minimized and
easy to recover from, though recovery still requires the traditional
full filesystem scan.
Journalling is a key feature of many modern filesystems like NTFS,
XFS, JFS, ReiserFS, and Ext3, so the ground is well covered and the
risks for UFS/FFS are low. I'm aware that groups from CMU and RPI have
attempted similar work in the past, but unfortunately the work is
either very outdates, or I haven't had any luck in contacting the
groups. Is this absence, I've decided to work on this project myself
in hopes of having a functional prototype in time for FreeBSD 6.0.
The approach is simple and journals full metadata blocks instead of
just deltas or high-level operations. This greatly simplifies the
replay code at the cost of requiring more disk space for the journal
and more work within the filesystem to identify discreet update
points. An important design consideration is whether to make the
journal data and code compatible with the UFS2 filesystem, or to start
a new UFS3 derivative. Since the latter presents a very high barrier
to adoption for most people, I'm going to try to make it a compatible
option for UFS2. This means that the journal blocks will likely appear
as an unlinked file to legacy filesystem and fsck code, and will be
treated as such. This will allow seamless fallback to using fsck,
though once the unlinked journal data blocks are reclaimed by fsck,
the user will have to take action to re-create the journal file again.
One key piece of journalling is ensuring that each journal transaction
is fully written to disk before the associated metadata blocks are
written to the filesystem. I plan to adopt the buffer 'pinning'
mechanism from Alexander Kabaev's XFS work to assist with this. This
will allow the journalling subsystem fine-grained control over which
blocks get flushed to disk by the buffer daemon without having to
further complicate the UFS/FFS code. One consideration is how
Softupdates falls into this and whether it is multually exclusive of
journalling or if it can help provide transaction ordering
functionality to the journal. Research here is on-going.
Some preliminary work can be found in Perforce in the
//depot/user/scottl/ufsj/... tree or at the URL provided. Hopefully
this will quickly accelerate.