tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
LFS thoughts
I've been thinking about LFS off and on for a while now, and I'd like to
run a few of my thoughts by everyone else. Since the last time I looked
closely at the code base, there have been quite a number of improvements
by some very good people! but it still has some issues.
1) The most vexing outstanding issue, in my mind, is the fact that the
cleaner often cannot improve the amount of available space on disk.
This is largely due to volatile metadata, in particular index file
blocks, being written into the segments while cleaning. These blocks
almost immediately become stale again, leaving the newly compacted
segment looking as if it needs cleaning again. (When the filesystem is
empty, this is not a big deal, but when it approaches full it's a
killer.) The same is true of inode blocks and indirect blocks, though
to a lesser extent. If the index file could be segregated from the
regular file data, it would help the situation immensely.
I can think of two ways that such segregation could be done while
retaining compatibility with the on-disk structures: (a) writing
ordinary data and ifile data to two separate log heads ("Orthos"), or
(b) a non-logged ifile, which would have to be duplicated on disk to
retain consistency in the case of a crash ("Ibis").
The Orthos case would entail the extraction of part of the checkpoint
region of the superblock into to its own data structure, and duplication
of that structure to provide a runtime notion of a separate log that
places ifile data into segments that contain nothing else. The
relatively rapid turnover of the index file would then be limited to
those segments, and the cleaner wouldn't have to plow through the rest
of the filesystem (though it might still constantly clean recently
written ifile segments if the filesystem were too full).
The Ibis case would also flag and reserve segments for the index file,
but here the index file would be written back to the same location on
disk every time, similar to traditional filesystems. To provide a
consistency guarantee, a second copy would need to be maintained as
well, with updates to one completed before writing to the other; but
non-checkpoint writes should not in general need to wait for this
synchronization.
I think Orthos would be easier to implement, unless we were willing to
drop forward compatibility, in which case an Ibis approach that laid the
index file out sequentially through adjacent reserved segments would be
quite straightforward.
(If we didn't care at all about compatibility, we could improve things
further by putting the entire inode into the ifile rather than just a
pointer to the inode block, as suggested long ago by joff@. This would
make the ifile quite a bit larger and less likely to fit in the buffer
cache, but would keep empty inode blocks from contributing to the
cleaning inefficiency problem. It would also make file reads slower in
many cases since the inodes would not be physically close to where the
file data blocks are. If the disk has no seek penalty, of course, this
might not be a problem at all.)
2) Connecting dirty pages directly to buffer headers when writing might
be resulting in incorrect partial-segment checksums. I can't be sure
that that is the cause, but the checksums are definitely sometimes
incorrect even when the segments were written (for all I can tell)
properly. This would interfere with roll-forward, but more importantly,
if the cleaner is paying attention to the checksums as it ought, then
those segments might become uncleanable. Before UBC, lfs_writeseg()
freed data buffers by copying their data into larger, pre-reserved
buffers before checksumming the lot and sending it to disk. This also
frees up the buffers/pages very quickly compared to waiting for the
disk, though of course at the expense of CPU and reserved memory.
3) Roll-forward and some form of cleaning should be moved in-kernel. I
already have code for in-kernel roll forward past the second checkpoint
that I need to dust off, test and commit. Cleaning is trickier because
an in-kernel cleaner would be less flexible, but the basic cleaning and
defragmenting functionality should be there.
There are other issues, but I'll stop there since this is already
getting fairly long.
Does anyone have any objections, or comments, about any of this?
There has been quite a lot of work on LFS in the last 20 years, some
with hints of a roadmap. Does anyone else have specific ideas about the
most glaring issues, or what should be done next?
Thanks,
Konrad Schroder
perseant%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index