tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lost file-system story

On Sun, Dec 11, 2011 at 9:53 PM, Greg A. Woods <> 
> At Fri, 9 Dec 2011 22:12:25 -0500, Donald Allen 
> <> wrote:
> Subject: Re: Lost file-system story
>> On Fri, Dec 9, 2011 at 8:43 PM, Greg A. Woods <> 
>> wrote:
>> > At Fri, 9 Dec 2011 15:50:35 -0500, Donald Allen 
>> > <> wrote:
>> > Subject: Re: Lost file-system story
>> > >
>> > > "does not guarantee to keep a consistent file system structure on the
>> > > disk" is what I expected from NetBSD. From what I've been told in this
>> > > discussion, NetBSD pretty much guarantees that if you use async and
>> > > the system crashes, you *will* lose the filesystem if there's been any
>> > > writing to it for an arbitrarily long period of time, since apparently
>> > > meta-data for async filesystems doesn't get written as a matter of
>> > > course.
>> >
>> > I'm not sure what the difference is.
>> You would be sure if you'd read my posts carefully. The difference is
>> whether the probability of an async-mounted filesystem is near zero or
>> near one.
> I think perhaps the misunderstanding between you and everyone else is
> because you haven't fully appreciated what everyone has been trying to
> tell you about the true meaning of "async" in Unix-based filesystems,
> and in particular about NetBSD's current implementation of Unix-based
> filesystems, and what that all means to implementing algorithms that can
> relibably repair the on-disk image of a filesystem after a crash.
> I would have thought the warning given in the description of "async" in
> mount(8) would be sufficient, but apparently you haven't read it that
> way.
> Perhaps the problem is the last occurance of the word "or" in the last
> sentence of that warning should be changed to "and".  To me that would
> at least make the warning a bit stronger.
>> > And that's why by default, and by very strong recommendation, filesystem
>> > metadata for Unix-based filesystems (sans WABPL) should always be
>> > written synchronously to the disk if you ever hope to even try to use
>> > fsck(8).
>> That's simply not true. Have you ever used Linux in all the years that
>>  ext2 was the predominant filesystem? ext2 filesystems were routinely
>> mounted async for many years; everything -- data, meta-data -- was
>> written asynchronously with no regard to ordering.
> DO NOT confuse any Linux-based filesystem with any Unix-based
> filesystem.  They may have nearly identical semantics from the user
> programming perspective (i.e. POSIX), but they're all entirely different
> under the hood.
> Unix-based filesystems (sans WABPL, and ignoring the BSD-only LFS) have
> never ever Ever EVER given any guarantee about the repariability of the
> filesystem after a crash if it has been mounted with MNT_ASYNC.
> Indeed it is more or less _impossible_ by design for the system to make
> any such guarantee given what MNT_ASYNC actually means for Unix-based
> filesystems, and especially what it means in the NetBSD implementation.

Everything you said in the above two paragraphs is true of Linux ext2.
You are correct -- it is theoretically impossible to guarantee the
repairability of a filesystem -- whether Unix, Linux or anything else.
I understand that and have said so repeatedly, which seems not to get
read by some.

>> > Unix filesystems, including Berkeley Fast File System variant, have
>> > never made any guarantees about the recoverability of an async-mounted
>> > filesystem after a crash.
>> I never thought or asserted otherwise.
> Well, from my perspective, especially after carefully reading your
> posts, you do indeed seem to think that async-mounted Unix-based
> filesystems should be able to be repaired, at least some of the time,
> despite the documentation, and all the collected wisdom of those who've
> replied to your posts so far, saying otherwise.
>> > You seem to have inferred some impossible capability based on your
>> > experience with other non-Unix filesystems that have a completely
>> > different internal structure and implementation from the Unix-based
>> > filesystems in NetBSD.
>> Nonsense -- I have inferred no such thing. Instead of referring you to
>> previous posts for a re-read, I'll give you a little summary. I am
>> speaking about probabilities. I completely understand that no
>> filesystem mounted async (or any other way, for that matter), whether
>> Linux or NetBSD or OpenBSD, is GUARANTEED to survive a crash.
> OK, let's try stating this once more in what I hope are the same terms
> you're trying to use:  The probablility of any Unix-based filesystem
> being repariable after a crash is zero (0) if it has been mounted with
> MNT_ASYNC, and if there was _any_ activity that affected its structure
> since mount time up to the time of the crash.  It still might survive
> after some types of changes, but it _probably_ won't.  There are no
> guarantees.  Use "newfs" and "restore" to recover.

Greg. P(survival after a crash) = 0 means that if the system crashes
you WILL lose the filesystem. Have you read my posts of yesterday and
today? I have deliberately crashed NetBSD and OpenBSD five or six or
seven times, some of them under brutal conditions (a lot of writing
going on when I pulled the plug). The async-mounted filesystems
survived EVERY ONE OF THOSE CRASHES. I just don't know how to make
this more clear. That PROVES that the probability of survival of
NetBSD and OpenBSD async filesystems is > 0, not 0 as you assert.

And you even contradict yourself above. Either it is zero or it's not.
First you say "The probablility of any Unix-based filesystem
being repariable after a crash is zero (0) if it has been mounted with
MNT_ASYNC, and if there was _any_ activity that affected its structure
since mount time up to the time of the crash." Then you hedge your
bets with "It still might survive
after some types of changes, but it _probably_ won't." Again, either
it's zero or it's not. A tautology. You can't have it both ways, but
you sure are trying.

Then you say "There are no guarantees". A guarantee means the
probability of survival is ONE. I have repeatedly said that I
understand that is IMPOSSIBLE with async filesystems. I've said it in
clear, unambiguous English. I guarantee you that we all understand
that there are no guarantees.

> Linux ext2 is not a Unix-based filesystem and Linux itself is not a
> Unix-based kernel.  The meaning of "async" to ext2 is apparently very
> different than it is to any Unix-based filesystem.

It's not, you are simply wrong. It's exactly the same as NetBSD:
meta-data is written asynchronously and without any regard to

NetBSD might be free
> of UNIX(tm) code, but it and its progenitors, right back to the 7th
> Edition of the original Unix, were all implemented by people firmly
> entrenched in the original Unix heritage from the inside out.
> For Unix-based filesystems and their repair tools, any probablility of
> recovery less than one is as good as if it were zero.

How can you possibly say such a thing and hope to be taken seriously?
What you just said means that P(survival) = .999 is the same as
P(survival) = 0.

There are a LOT of situations (e.g., mine) where P(survival) = .999
would be very acceptable and P(survival) = 0 would not.

Don't ever get
> your hopes up.  Use "newfs" and "restore" to recover -- it'll be faster
> on average in the long term.
> Perhaps this sentence from McKusick's memo about fsck will help you to
> understand:  "fsck is able to repair corrupted file systems using
> procedures based upon the order in which UNIX honors these file system
> update requests."  This is true for all Unix-based filesystems.

I'm not going to put words in McKusick's mouth, but I think you have
misinterpreted this to mean that without ordering, recovery is
impossible. If that's what you think (and you've said so, except when
you've contradicted yourself), then you are wrong. Why? Because the
evidence (e.g., my experiments) says  that recovery *is* possible. Not
guaranteed. Possible.

> With MNT_ASYNC there is, by definition, no guarantee about the order of
> metadata updates, or even that there will be _any_ metadata updates, and
> so there is no possiblity that _any_ algorithm can ever reliably repair
> an async-mounted filesystem damaged by a crash.  Use "newfs" to recover

Home | Main Index | Thread Index | Old Index