tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lost file-system story



On Tue, Dec 06, 2011 at 11:10:44AM -0500, Donald Allen wrote:
 > My Linux experience, and this is strictly gut feel -- I have no
 > hard evidence to back this up -- tells me that if this had happened
 > on a Linux system with an async, unjournaled filesystem, the
 > filesystem would have survived.

Yes, it likely would have, at least if that filesystem was ext2fs.

There is at least one issue beyond "bugs" though: ext2's fsck is
written to cope with this situation. The ffs fsck isn't, and so it
makes unwarranted assumptions and gets itself into trouble, sometimes
even into infinite repair loops. (That is, where you can 'fsck -fy'
over and over again and it'll never reach a clean state.)

The short answer is: don't do that.

I have no idea, btw, if using our ext2fs this way, along with e2fsck
from the Linux ext2fsprogs, can be expected to work or not. I have
doubts about our fsck_ext2fs though.

 > In
 > suggesting that I post this, Christos mentioned that he's seen
 > situations where a lot of writing happened in a session (e.g., a
 > kernel build) and then the sync at shutdown time took a long time,
 > which has made him somewhat suspicious that there might be a problem
 > with the trickle sync that the kernel is supposed to be doing.

There is at least one known structural problem where atime/mtime
updates do not get applied to buffers (but are instead saved up
internally) so they don't get written out by the syncer.

We believe this is what causes those unmount-time writes, or at least
many of them. However, failure to update timestamps shouldn't result
in a trashed fs.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index