tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

LFS thoughts



I've been thinking about LFS off and on for a while now, and I'd like to run a few of my thoughts by everyone else.  Since the last time I looked closely at the code base, there have been quite a number of improvements by some very good people! but it still has some issues.

1) The most vexing outstanding issue, in my mind, is the fact that the cleaner often cannot improve the amount of available space on disk.  This is largely due to volatile metadata, in particular index file blocks, being written into the segments while cleaning.  These blocks almost immediately become stale again, leaving the newly compacted segment looking as if it needs cleaning again.  (When the filesystem is empty, this is not a big deal, but when it approaches full it's a killer.)  The same is true of inode blocks and indirect blocks, though to a lesser extent.  If the index file could be segregated from the regular file data, it would help the situation immensely.

I can think of two ways that such segregation could be done while retaining compatibility with the on-disk structures: (a) writing ordinary data and ifile data to two separate log heads ("Orthos"), or (b) a non-logged ifile, which would have to be duplicated on disk to retain consistency in the case of a crash ("Ibis").

The Orthos case would entail the extraction of part of the checkpoint region of the superblock into to its own data structure, and duplication of that structure to provide a runtime notion of a separate log that places ifile data into segments that contain nothing else.  The relatively rapid turnover of the index file would then be limited to those segments, and the cleaner wouldn't have to plow through the rest of the filesystem (though it might still constantly clean recently written ifile segments if the filesystem were too full).

The Ibis case would also flag and reserve segments for the index file, but here the index file would be written back to the same location on disk every time, similar to traditional filesystems.  To provide a consistency guarantee, a second copy would need to be maintained as well, with updates to one completed before writing to the other; but non-checkpoint writes should not in general need to wait for this synchronization.

I think Orthos would be easier to implement, unless we were willing to drop forward compatibility, in which case an Ibis approach that laid the index file out sequentially through adjacent reserved segments would be quite straightforward.

(If we didn't care at all about compatibility, we could improve things further by putting the entire inode into the ifile rather than just a pointer to the inode block, as suggested long ago by joff@.  This would make the ifile quite a bit larger and less likely to fit in the buffer cache, but would keep empty inode blocks from contributing to the cleaning inefficiency problem.  It would also make file reads slower in many cases since the inodes would not be physically close to where the file data blocks are.  If the disk has no seek penalty, of course, this might not be a problem at all.)

2) Connecting dirty pages directly to buffer headers when writing might be resulting in incorrect partial-segment checksums.  I can't be sure that that is the cause, but the checksums are definitely sometimes incorrect even when the segments were written (for all I can tell) properly. This would interfere with roll-forward, but more importantly, if the cleaner is paying attention to the checksums as it ought, then those segments might become uncleanable.  Before UBC, lfs_writeseg() freed data buffers by copying their data into larger, pre-reserved buffers before checksumming the lot and sending it to disk.  This also frees up the buffers/pages very quickly compared to waiting for the disk, though of course at the expense of CPU and reserved memory.

3) Roll-forward and some form of cleaning should be moved in-kernel.  I already have code for in-kernel roll forward past the second checkpoint that I need to dust off, test and commit.  Cleaning is trickier because an in-kernel cleaner would be less flexible, but the basic cleaning and defragmenting functionality should be there.

There are other issues, but I'll stop there since this is already getting fairly long.

Does anyone have any objections, or comments, about any of this?

There has been quite a lot of work on LFS in the last 20 years, some with hints of a roadmap.  Does anyone else have specific ideas about the most glaring issues, or what should be done next?

Thanks,

Konrad Schroder
perseant%netbsd.org@localhost



Home | Main Index | Thread Index | Old Index