Subject: Re: kern/27802: on disk full, last-edited file opened instead of binary
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: David Krinsky <krinsky+netbsd@bantha.org>
List: netbsd-bugs
Date: 11/05/2004 10:47:31
On Fri, Nov 05, 2004 at 04:21:33PM +0100, Manuel Bouyer (bouyer@antioche.lip6.fr) wrote:
> OK, so it looks like the problem isn't NFS-related either.
> This points to an issue in the VM system. One senario could be this one:
> when a filesystem full (or overquota) error occurs, the page mapped to
> the file (file1) is freed and can be recycled for use by another open file
> (file2). But something in the file1 management keeps a reference to this page,
> and a subsequent write to file1 will change data in this page and mark it
> dirty. So even if the page now points to a read-only file, as it has been
> marked dirty the corrupted data are written back to disk to file2.

No, I don't think that's quite consistent with my bug.  A few issues:

(1) Neither the file I expected (in /etc) nor the file I saw (in
/usr/bin) was on a filesystem that was full: it was /var, a third
filesystem, that was full.  So *neither* file should have been in an
error path; rather, I think there's got to be some other locking
and/or allocation bug, such that a single page (or some other relevant
data structure in the UBC) can get mapped to two different files even
if neither of them errors.

(Maybe a single structure is erroneously returned to a free queue
twice somewhere in the disk-full error path?)

(2) The on-disk copy of the file whose contents were unexpected,
/usr/bin/telnet, was unaffected; after the first few strange accesses,
the file returned to normal.  This just means that the page was
never marked dirty, of course, but it also means that the page was
probably not changed from under an open file.  Rather, I think that
when the /usr/bin/telnet file was opened, I think it somehow had to
wind up pointing at some UBC structure that had already been
allocated to /etc/inetd.conf.

(This is consistent with the double-freeing theory.)

(3) I think it's actually fairly unlikely that anything had
/etc/inetd.conf open; unless inetd takes a really long time to read
config files, or someone else on the system decided to look at it
(unlikely, though certainly possible), I don't know what would have
had it open.  I just checked, and inetd closes its config files
promptly after reading them.  It should certainly still have been in
the buffer cache, though.

But yeah, it sure feels like a UBC page-allocation bug.

Dave.