[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: failure in NetBSD while running as root
On Wed Mar 30 2011 at 12:21:59 +0100, Julio Merino wrote:
> + martin, pooka, who have seen this periodically
> On Tue, Mar 29, 2011 at 7:42 PM, Julio Merino <jmmv%netbsd.org@localhost>
> > On Tue, Mar 29, 2011 at 7:37 PM, Jeff Rizzo <riz%netbsd.org@localhost>
> > wrote:
> >> atf-run: ERROR: XXX: Cannot get information of /tmp/atf-run.14884b/mnt;
> >> atf-run: lstat(2) failed: Device not configured
> >> g4:riz /usr/tests/fs/psshfs>
> >> I have actually removed the umount, and while I don't get the error, it
> >> just
> >> fails sometimes.
> > Aha! I could eventually get it to fail here with "Device busy",
> > although it surely is the same issue. It smells like race condition.
> > I'll take a look.
> Alright. I know what's happening. The offending test case is
> mounting a rump file system and it is running a daemon in the
> background that creates a pid file in the work directory. During the
> test case cleanup, the file system is unmounted and the server is
> killed. And here comes the race condition: neither unmounting the
> puffs file system nor the termination of the server (with the
> accompanied removal of its pid file) are synchronous.
> When atf-run attempts to do the work directory clean up, it scans a
> still-changing file system and bad things happen. For example, it may
> enumerate the directory contents first and, later, when attempting to
> delete a supposedly-existing file, get a ENOENT. Or it may try to
> enumerate the contents of a mount point at the same time as the puffs
> server process is exiting.
> I have committed revision 648ed6360b2b7cda81a6079b00dc436d09c745b8
> which implements a workaround for this situation: the "fix" is to
> either retry failing file system operations a few times in an attempt
> to allow the work directory to stabilize, or to ignore
> supposedly-transient errors.
> Now, this feels like a very ugly hack but I'm not sure how we'd do
> better. For file systems, I see that fuse has a "sync_unmount"
> command/flag (dunno what exactly it is) that was added for exactly
> this purpose. For daemons... we can't control their termination
unmount (i.e. umount /mnt/path) *is* synchronous. However, in case
the server exits (via signal, crash, or whatever) there is a window of
limbo during which the file system is being unmounted from the kernel.
Since signals happen asynchronously, I can't see how it's possible to
provide anything synchronous against that.
älä karot toivorikkauttas, kyl rätei ja lumpui piisaa
Main Index |
Thread Index |