tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Lost file-system story
Hello. Just for your edification, it is possible to break out of fsck
mid-way and reinvoke it with fsck -y to get it to do the cleaning on its
own.
With regard to your notes on speed with NetBSD versus OpenBSD, I
suspect the speed trade off is where the difference is. OpenBSD is
flushing buffers to disk more frequently than NetBSD is, and thus the
filesystem is more complete with respect to what is on disk. Since you
readily admit that you are a rare case, might I suggest that there may be
an easy way for you to have your cake and eat it too. That is, get the
speed and performance of NetBSD with the relative reliability, which may
have been luck -- I'm not sure, with OpenBSD. You could write yourself a
little program, or find an old version of update(8) from old source trees,
which runs as a daemon and calls sync(2) every n seconds where n is what
ever comfort level you deem appropriate. I believe that when you call
sync(2), even async mounted filesystem data is flushed. With that program
running, I'd be interested in having you retry your experiment with NetBSD
and see if your results differ.
-Brian
On Dec 9, 3:50pm, Donald Allen wrote:
} Subject: Re: Lost file-system story
} I just did a little experiment. I installed OpenBSD 5.0 on the same
} machine where I had my adventure with NetBSD. This time, I broke up
} the world into separate filesystems, which OpenBSD facilitates,
} mounting only /home and /tmp async, noatime. All the others were
} mounted softdep,noatime. I downloaded ports.tar.gz and un-tarred it
} into my home directory (I had previously un-tarred it into /usr). I
} then did
}
} rm -rf ports
}
} which takes awhile. While that was going, I hit the power button (I
} can afford to lose a filesystem containing only my home directory;
} it's backed up thoroughly, because I rsync it from one machine to
} another; there are current copies on several other machines). The
} system did a rapid shutdown without sync'ing the filesystems.
}
} On restart, all the softdep-mounted filesystems had no errors in fsck,
} as one might expect (especially since there was no intensive
} write-activity going on when I improperly shut the system down, as
} there was in /home), but I got an "Unexpected inconsistency" error in
} my home directory and requested a manual fsck; the system dropped into
} single-user mode after the automatic fscks finished. I ran the fsck on
} the filesystem that gets mounted as /home and there were a number of
} files and directories that were apparently half-deleted and it asked
} me one-by-one if I wanted to delete them. I did with a few, but then
} used the 'F' option to do so without further interaction (I don't
} believe the NetBSD fsck gave me that option; it is not documented in
} the NetBSD fsck man page, while it *is* documented in the OpenBSD fsck
} man page). The fsck completed and marked the filesystem clean. I
} rebooted, everything mounted normally, and a check of my home
} directory shows everything intact, even most of the ports directory,
} whose deletion I deliberately interrupted.
}
} The async warning in the OpenBSD mount page reads as follows:
}
} async Metadata I/O to the file system should be done
} asynchronously. By default, only regular data is
} read/written asynchronously.
}
} This is a dangerous flag to set since it does not
} guarantee to keep a consistent file system structure on
} the disk. You should not use this flag unless you are
} prepared to recreate the file system should your system
} crash. The most common use of this flag is to speed up
} restore(8) where it can give a factor of two speed
} increase.
}
} "does not guarantee to keep a consistent file system structure on the
} disk" is what I expected from NetBSD. From what I've been told in this
} discussion, NetBSD pretty much guarantees that if you use async and
} the system crashes, you *will* lose the filesystem if there's been any
} writing to it for an arbitrarily long period of time, since apparently
} meta-data for async filesystems doesn't get written as a matter of
} course. And then there's the matter of NetBSD fsck apparently not
} really being designed to cope with the mess left on the disk after
} such a crash. Please correct me if I've misinterpreted what's been
} said here (there have been a few different stories told, so I'm trying
} to compute the mean).
}
} I am not telling the OpenBSD story to rub NetBSD peoples' noses in it.
} I'm simply pointing out that that system appears to be an example of
} ffs doing what I thought it did and what I know ext2 and journal-less
} ext4 do -- do a very good job of putting the world into operating
} order (without offering an impossible guarantee to do so) after a
} crash when async is used, after having been told that ffs and its fsck
} were not designed to do this. The reason I'm beating on this is that I
} would have liked to use NetBSD for the application I have in mind, but
} I need the performance improvement that async provides (my tests show
} this; the tests also show that NetBSD async is about as fast as Linux,
} much faster than OpenBSD async, at least for doing a lot of writing,
} such as un-tarring a large tar file). This is practical if the joint
} probability of the system crashing *and* losing the async filesystem
} is low. My one little data point was discouraging -- the system
} crashed when using a wireless card with a common chipset (atheros)
} resulted in losing my network connection and then a system crash when
} a restart of networking was attempted (and, I had to use the atheros
} card because the system didn't pick up the built-in Cisco wireless
} device, which I think is supposed to be served by the an driver). The
} crash took out the filesystem, as we've been discussing.
}
} So I'd love it if my experience encourages someone to improve NetBSD
} ffs and fsck to make use of async practical, perhaps by drawing on
} what OpenBSD has done. I also realize that my situation is unusual,
} and with resources being scarce, there are a lot more important things
} to work on, that will affect a lot more people. But I'd at least like
} to get it in the queue.
}
} /Don Allen
>-- End of excerpt from Donald Allen
Home |
Main Index |
Thread Index |
Old Index