tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Fwd: Lost file-system story



I did it again. gmail is trying to teach an old dog a new trick ....


---------- Forwarded message ----------
From: Donald Allen <donaldcallen%gmail.com@localhost>
Date: Tue, Dec 13, 2011 at 10:04 AM
Subject: Re: Lost file-system story
To: David Holland <dholland-tech%netbsd.org@localhost>


On Tue, Dec 13, 2011 at 1:27 AM, David Holland 
<dholland-tech%netbsd.org@localhost> wrote:
> On Mon, Dec 12, 2011 at 03:31:09PM -0500, Donald Allen wrote:
>  > Note that this bug *may* not worsen the probability of recovery after
>  > a crash. It might even increase it! Think about it. If you boot NetBSD
>  > and mount a filesystem async, it is going to be correctly structured
>  > (or deemed to be by fsck) at boot time, or the system wouldn't mount
>  > it. Assuming the system is happy with it, if you then make changes to
>  > the filesystem,  but, because of this bug they are all in the buffer
>  > cache and never get written out, and then the system crashes ---
>  > you've got the filesystem you started with.
>
> Not necessarily;

I did say "*may*" (which I wrote because you could write a good book
about NetBSD internals with what I don't know about NetBSD internals).

right off I can see two ways to get hosed:
>
> 1. Delete a large file. This causes the in-memory FS to believe the
> indirect blocks from this file are free; then it can reallocate them
> as data for some other file. That data then *does* get written out, so
> after crashing and rebooting the indirect blocks contain utter
> nonsense. The ffs fsck probably can't recover this.
>
> 2. Use a program that calls fsync(). This will write out some metadata
> blocks and not others; in the relatively benign case it will just
> update a previously-free inode and after crashing fsck will place the
> file in lost+found. In less benign cases it might do the converse of
> (1), and e.g. overwrite file data with indirect blocks, leading to
> crosslinked files or worse and probably total fsck failure.
>
> Not that any of this matters...

I agree. I was just indulging in some idle speculation, having some
fun. This bug should be fixed and I think the fix, as I said before,
should include a knob to allow the user to control the sync frequency
(maybe the knob is already there in sysctl and getting ignored for
some reason?). I'm running NetBSD again on my test machine, and have a
sleep-sync loop started in rc.local.

/Don


>
> --
> David A. Holland
> dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index