Subject: Re: LFS status?
To: Geert Hendrickx <ghen@NetBSD.org>
From: Konrad Schroder <email@example.com>
Date: 05/02/2006 15:45:33
On Tue, 2 May 2006, Geert Hendrickx wrote:
> I've seen an increasing number of commits to sys/ufs/lfs lately (mostly
> by perseant), which tempts me to give LFS another try. What's the
> status of LFS, from an end-user POV, in NetBSD-current? And in 3.0?
> For what kind of use is LFS advisable?
I think that LFS is coming around at last. My time to work on it has
always been sporadic; lucky for LFS I've had an unusual amount of time for
it recently :^)
There are basically three classes of problems that one runs into over and
over with LFS:
1) Running off the end of the log ("lfs_nextseg: no clean segments")
2) Inconsistency of on-disk checkpoints
3) Deadlocks of various kinds
I've had reports of specific instances of each of these from people who,
like you, have seen the commits and thought of trying it out (and, to my,
er, joy? I've been able to reproduce most of them). Some other important
but less immediate shortcomings are
4) It uses 32-bit on-disk quantities
5) The multiprocessing locking situation needs auditing
I've been working on #2 most recently, using a simple snapshot mechanism
to stop writes to the disk at the point the log is about to wrap,
recreating the on-disk state for every available checkpoint (since we
haven't wrapped, we have all the data necessary to do this) and running
fsck_lfs on the results. This seems to be working now (fingers crossed)
and since it's codified in a regression test, I think we can keep issue #3
under control from this point forward. If on-disk consistency is really
working, no fsck should be required to mount the disk after a crash.
Now, of course we want roll-forward if we are confident that it works; and
cleaning up allocated inodes with zero link count would also be nice.
(Zero link inodes don't represent a faulty filesystem---think
tmpfile(3)---but because the crash never closes the files properly it will
lose space over time.) I haven't yet made the checkpoint-checking test
roll forward through the non-checkpoint writes to verify the roll-forward
code in fsck_lfs, though that's definitely on the agenda. The in-kernel
roll-forward code does not work and should probably be scrapped.
The deadlocks are being addressed as they come up; the most irritating one
at present is that I implemented the "release the snapshot" mechanism as a
fcntl, and fcntl locks a vnode which might be locked by the cleaner while
it waits for the "release the snapshot" signal. What I really want is a
generic fsctl(2) that doesn't deal with a vnode at all...but I digress.
Difficulty #1 is LFS's Achilles' heel. It should be possible to
parametrize the filesystem for different workloads to avoid this problem,
and I've been working on that off and on the last few weeks. In the worst
case, though---random writes to gigantic files---it would require
allocating only 25% of the disk for user data, which is clearly
unacceptable. In the ordinary case we should be able to use >80% of the
disk for user data.
For what kind of use is LFS advisable? If you're okay with the risks of
#1 and #3, LFS will work best in situations where either (a) you're
crashing a lot anyway for reasons independent of LFS and would like a
faster uptime, or (b) you're doing lots of small file creation, where LFS
wins most clearly over FFS. I can't really provide a realistic risk
assessment. I can say that I've had an iMac with LFS root (booting its
kernel from an FFS partition) building the world over and over on a ~90%
full root filesystem for the last two weeks straight without a crash; that
gives a sense, but it's not much of a test when it comes down to it.
I haven't back-ported any of my recent changes to 3.x, though I think it
would be straightforward to do so for almost all of them.
> Thanks for the good work,
Thanks for making use of the work :^)