Subject: Re: fsck fscked-up my filesystem
To: None <Richard.Earnshaw@buzzard.freeserve.co.uk>
From: Luke Mewburn <lukem@netbsd.org>
List: current-users
Date: 09/18/2001 18:18:02
On Thu, Sep 13, 2001 at 09:53:54PM +0100, Richard Earnshaw wrote:
> I'm not sure exactly what happend, but I've just ended up with a very 
> corrupt filesystem after a crash.  It could be that the old version of 
> fsck that I have (-current of circa april) is incompatible with the latest 
> kernels, but I'm not sure.  Anybody any ideas?
> 
>  If I'm right, and it is related to the relative versions of the two, then 
> folks need to be very careful not to run fsck manually after rebooting 
> with a new kernel since old fscks will really mess your directory 
> structure up.
> 
> Symptoms: system crashes leaving filesystem not unmounted cleanly.  Fsck 
> -p runs and complains that it can't do the job, run manually;  run 
> manually manythings seem broken (the whole of my /usr partition ended up 
> in lost+found); forcibly run fsck again and the file-system still seems 
> broken (this is repeatable).  Back off to old kernel; fsck fixes disks ok 
> (though they are still messed up by the first fsck run).
> 
> (of course, I may just have built a duff kernel somehow).


I've been trying to reproduce this problem, and I've had no luck.

I've used a "newdirpref" kernel, and both a -current fsck_ffs (matched
for the kernel), as well as fsck_ffs from 1.5.2 and a 1.5 on my
NetBSD/alpha PC164, worked "as expected" in the following
circumstances:
	* cleanly unmounted file system

	* file system currently mounted

	* file system mounted, but "idle", and the machine rebooted
	  with "halt -qn"

	* file system mounted, with an untar operation in action,
	  and the machine rebooted with "halt -qn"

	* file system mounted with "softdeps", with an untar operation
	  in action, and the machine rebooted with "halt -qn"

In almost all circumstances, the output and behaviour was identical
between fsck_ffs from -current and from the release. There was one
minor problem that 1.5's fsck didn't find that -current's did, but
there have been some bug fixes in fsck_ffs in -current that don't
appear to have been back-ported to the 1.5 branch.

If anyone can help reproduce this problem with the "newdirpref" code,
(or any other ffs problems) that occur when a -current kernel is used
with older userland (fsck_ffs, etc), especially when softdep is NOT
being used, I'd be extremely interested.

Luke.