The reason for checking the filesystem before rolling forward is that, roll-forward being relatively untested, I wanted to check that the older checkpoint was consistent before checking the newer (remember that the newer checkpoint's consistency can't be taken for granted). If the older checkpoint was not consistent, it should be fixed before rolling forward. If it was consistent (either checked or, in the case of fsck -p, assumed) but roll-forward between the two checkpoints failed, the older one was a valid state of the filesystem; if it succeeded, the newer checkpoint was a valid state of the filesystem. Rolling forward past the newer checkpoint requires resizing inodes etc. and only makes sense in the context of a consistent file system.
So, I have no doubt that rewriting fsck_lfs from scratch and/or cleaning it up make perfect sense, but there is some reasoning in the madness too. In particular, I don't agree that fsck_lfs should be limited to recreating the ifile; it needs to be able to recover as much as possible in the event of bad blocks as well.
The in-kernel roll forward is disabled because it is broken. It worked briefly before LFSv2, but I never got back to fixing it after it broke. I think Dr. Seltzer must be thinking of another OS, since 4.4lite2 did not contain any roll forward code at all. It's also worth asking whether the user should have control over when and whether roll-forward occurs, which is straightforward with a userland fsck but more difficult if it is in-kernel.
Take care, -Konrad On 07/12/2014 10:52 AM, David Holland wrote:
A long time ago (in <Pine.NEB.4.64.1002090351150.23795%mail.netbsd.org@localhost>) you wrote: > I do disable fsck_lfs. It usually causes more problems than it > solves. It needs a complete overhaul. It tries to act like > fsck_ffs instead of validating segment checksums and regenerating > the ifile. A quick look at fsck_lfs with this in mind suggests that it's full of bull, yes. For some reason it tries to check the fs *before* rolling forward; it seems unlikely that this would ever work properly. However, as far as I can readily tell the obvious problems are limited to doing a full fsck, and all that the reboot time fsck -p does is roll forward. Given that the kernel roll forward code is disabled by default (does anyone know why? Margo Seltzer says it shouldn't be), disabling boot-time fsck seems like a bad idea. Unless the roll-forward code is broken, in which case it should be fixed. I don't see any PRs on it though. Anyhow, in the absence of any specific information, unless testing turns up some issues, I'm inclined to revert the commit I just made and re-enable fsck_lfs -p.