Subject: Re: kern/27802: on disk full, last-edited file opened instead of binary
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: David Krinsky <krinsky+netbsd@bantha.org>
List: netbsd-bugs
Date: 11/07/2004 19:27:14
On Sun, Nov 07, 2004 at 09:42:04PM +0100, Manuel Bouyer (bouyer@antioche.lip6.fr) wrote:
> On Fri, Nov 05, 2004 at 10:47:31AM -0500, David Krinsky wrote:
> > No, I don't think that's quite consistent with my bug.  A few issues:
> > 
> > (1) Neither the file I expected (in /etc) nor the file I saw (in
> > /usr/bin) was on a filesystem that was full: it was /var, a third
> > filesystem, that was full.  So *neither* file should have been in an
> > error path; rather, I think there's got to be some other locking
> > and/or allocation bug, such that a single page (or some other relevant
> > data structure in the UBC) can get mapped to two different files even
> > if neither of them errors.
> 
> Hum, is /var/tmp a different partition ? vi will create a copy of the edited
> file, which could have been in an error path.

No, /var/tmp is on /var.

But vi wasn't running when I saw my strange behavior.  I suppose it's
possible that if vi didn't clean up after itself, it could have been a
vi temp file, rather than /etc/inetd.conf, that I saw as
/usr/bin/telnet.  But even on disk full, vi should have been able to
delete its temporary files.

> > Rather, I think that
> > when the /usr/bin/telnet file was opened, I think it somehow had to
> > wind up pointing at some UBC structure that had already been
> > allocated to /etc/inetd.conf.
> > 
> > (This is consistent with the double-freeing theory.)
> > 
> > (3) I think it's actually fairly unlikely that anything had
> > /etc/inetd.conf open; unless inetd takes a really long time to read
> > config files, or someone else on the system decided to look at it
> > (unlikely, though certainly possible), I don't know what would have
> > had it open.  I just checked, and inetd closes its config files
> > promptly after reading them.  It should certainly still have been in
> > the buffer cache, though.
> 
> Are you sure /usr/bin/telnet was not already open ? Maybe you already had an
> instance of it running somewhere ?

It's possible, yes.  As I said, this is a production multiuser shell
account server.

> > But yeah, it sure feels like a UBC page-allocation bug.
> 
> I agree that your observation doens't exactly match your theory, but the
> double-free to list bug doesn't match mine either. Once the double-free occurs,
> the duplicate entry would remain here until reboot, keeping on corrupting other
> files. In my case I didn't reboot the server once the corruption was noticed.
> I just restored the file from backup, and keep the server running for months
> without problems further problems. The fact that the problem is transient
> looks more like a dandling pointer.

Yeah, that's true...mine did crash inexplicably a couple of days
later, but it was a couple of days.

But I'm not 100% sure we're seeing the same bug.  The fact that in my
case, the "corrupt" file actually never was corrupted on disk is a
rather marked difference, although I could just have gotten lucky in
terms of what pages were marked dirty when.

D.