current-users: Re: LFS status?

Subject: Re: LFS status?
To: Geert Hendrickx <ghen@NetBSD.org>
From: Konrad Schroder <perseant@hhhh.org>
List: current-users
Date: 05/02/2006 15:45:33
On Tue, 2 May 2006, Geert Hendrickx wrote:

> I've seen an increasing number of commits to sys/ufs/lfs lately (mostly 
> by perseant), which tempts me to give LFS another try.  What's the 
> status of LFS, from an end-user POV, in NetBSD-current?  And in 3.0? 
> For what kind of use is LFS advisable?

I think that LFS is coming around at last.  My time to work on it has 
always been sporadic; lucky for LFS I've had an unusual amount of time for 
it recently :^)

There are basically three classes of problems that one runs into over and 
over with LFS:

   1) Running off the end of the log ("lfs_nextseg: no clean segments")
   2) Inconsistency of on-disk checkpoints
   3) Deadlocks of various kinds

I've had reports of specific instances of each of these from people who, 
like you, have seen the commits and thought of trying it out (and, to my, 
er, joy? I've been able to reproduce most of them).  Some other important 
but less immediate shortcomings are

   4) It uses 32-bit on-disk quantities
   5) The multiprocessing locking situation needs auditing

I've been working on #2 most recently, using a simple snapshot mechanism 
to stop writes to the disk at the point the log is about to wrap, 
recreating the on-disk state for every available checkpoint (since we 
haven't wrapped, we have all the data necessary to do this) and running 
fsck_lfs on the results.  This seems to be working now (fingers crossed) 
and since it's codified in a regression test, I think we can keep issue #3 
under control from this point forward.  If on-disk consistency is really 
working, no fsck should be required to mount the disk after a crash.

Now, of course we want roll-forward if we are confident that it works; and 
cleaning up allocated inodes with zero link count would also be nice. 
(Zero link inodes don't represent a faulty filesystem---think 
tmpfile(3)---but because the crash never closes the files properly it will 
lose space over time.)  I haven't yet made the checkpoint-checking test 
roll forward through the non-checkpoint writes to verify the roll-forward 
code in fsck_lfs, though that's definitely on the agenda.  The in-kernel 
roll-forward code does not work and should probably be scrapped.

The deadlocks are being addressed as they come up; the most irritating one 
at present is that I implemented the "release the snapshot" mechanism as a 
fcntl, and fcntl locks a vnode which might be locked by the cleaner while 
it waits for the "release the snapshot" signal.  What I really want is a 
generic fsctl(2) that doesn't deal with a vnode at all...but I digress.

Difficulty #1 is LFS's Achilles' heel.  It should be possible to 
parametrize the filesystem for different workloads to avoid this problem, 
and I've been working on that off and on the last few weeks.  In the worst 
case, though---random writes to gigantic files---it would require 
allocating only 25% of the disk for user data, which is clearly 
unacceptable.  In the ordinary case we should be able to use >80% of the 
disk for user data.

For what kind of use is LFS advisable?  If you're okay with the risks of 
#1 and #3, LFS will work best in situations where either (a) you're 
crashing a lot anyway for reasons independent of LFS and would like a 
faster uptime, or (b) you're doing lots of small file creation, where LFS 
wins most clearly over FFS.  I can't really provide a realistic risk 
assessment.  I can say that I've had an iMac with LFS root (booting its 
kernel from an FFS partition) building the world over and over on a ~90% 
full root filesystem for the last two weeks straight without a crash; that 
gives a sense, but it's not much of a test when it comes down to it.

I haven't back-ported any of my recent changes to 3.x, though I think it 
would be straightforward to do so for almost all of them.

> Thanks for the good work,

Thanks for making use of the work :^)

Take care,
 						Konrad Schroder
 						perseant@hhhh.org