tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lost file-system story



On Fri, Dec 9, 2011 at 4:33 PM, Brian Buhrow
<buhrow%lothlorien.nfbcal.org@localhost> wrote:
>        Hello.  Just for your edification, it is possible to break out of fsck
> mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
> own.

This whole discussion, interesting though it may be, may have occurred
simply because of my unfamiliarity with NetBSD and probably a mistake
in not looking at the fsck man page for something like the -y option
when I reached the point where continuing to feed 'y's to fsck after
the original crash seemed like a losing battle. Had I thought about -y
(I know that fscks typically have such an option, but in my experience
it's an optional answer to fsck questions, as OpenBSD's is; for
whatever reason, I didn't think of it), I'd have used it, since I had
nothing to lose at that point. But it's possible you have put your
finger on the real truth of what happened here. Read on.

You suggested trying the experiment I did with OpenBSD with NetBSD,
and so I did. Twice. I installed NetBSD with separate directories for
/, /usr, /var, /tmp, and /home, ala OpenBSD's default setup. All,
except /home and /tmp were mounted softdef,noatime. /home was mounted
async, and /tmp is an in-memory filesystem. The first time, I untarred
the OpenBSD ports.tar.gz (I used it because it was what I used in the
OpenBSD test, it's big, and I had it lying around) into a temporary
directory in my home directory. With the battery removed from the
laptop, I did an

rm -rf ports

and while that was happening, I pulled the power connector.

On restart, fsck found a bunch of things it didn't like about the
/home filesystem, but managed to fix things up to its satisfaction and
declare the filesystem clean. My home directory survived this and,
like OpenBSD, a fair amount of the ports directory was still present.
I then removed it and re-did the untar, while the untar was happening,
I again pulled the plug. This time, the automatic fsck got unhappy
enough to drop me into single-user mode and ran fsck there manually. I
again encountered a seemingly never-ending sequence of requests to fix
this and that. So I aborted and used the -y option. It charged through
a bunch of trouble spots and completed. On reboot, I found the same
situation as the first one -- home directory intact and some of the
ports directory present.

I have a some thoughts about this:

1. Had I run fsck -y at the time of the first crash, I might well have
found what I found today -- a repaired filesystem that was usable. So
my assertion that the filesystem was lost may well have simply been my
lack of skill as a NetBSD sys-admin.
2. Today's experiment shows that a NetBSD ffs filesystem mounted
async, together with its fsck, *is* capable of surviving even a pretty
brutal improper shutdown -- loss of power while a lot of writing was
happening. Obviously I still don't have enough data to know if the
probability of survival is comparable to Linux ext2, but what I found
today is at least encouraging.

I did one more experiment, and that was to untar the ports tarball,
and then waited about a minute. I then did a sync. The disk light
blinked just for a brief moment. This is a *big* tar file, but it
appears from this easy little test that there was not a huge amount of
dirty stuff sitting in the buffer cache. This is obviously not
definitive, but does suggest that NetBSD is migrating stuff from the
buffer cache back to the disk for async-mounted filesystems in timely
fashion. A look at the code is probably the final arbiter here. I also
note that there are sysctl items, such as vfs.sync.metadelay that I
would like to understand.

/Don Allen


Home | Main Index | Thread Index | Old Index