Subject: Re: kern/27802: on disk full, last-edited file opened instead of binary
To: David Krinsky <krinsky+netbsd@bantha.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: netbsd-bugs
Date: 11/07/2004 21:42:04
On Fri, Nov 05, 2004 at 10:47:31AM -0500, David Krinsky wrote:
> No, I don't think that's quite consistent with my bug.  A few issues:
> 
> (1) Neither the file I expected (in /etc) nor the file I saw (in
> /usr/bin) was on a filesystem that was full: it was /var, a third
> filesystem, that was full.  So *neither* file should have been in an
> error path; rather, I think there's got to be some other locking
> and/or allocation bug, such that a single page (or some other relevant
> data structure in the UBC) can get mapped to two different files even
> if neither of them errors.

Hum, is /var/tmp a different partition ? vi will create a copy of the edited
file, which could have been in an error path.

> 
> (Maybe a single structure is erroneously returned to a free queue
> twice somewhere in the disk-full error path?)
> 
> (2) The on-disk copy of the file whose contents were unexpected,
> /usr/bin/telnet, was unaffected; after the first few strange accesses,
> the file returned to normal.  This just means that the page was
> never marked dirty, of course, but it also means that the page was
> probably not changed from under an open file.

I'm not sure your second assertion is valid.

> Rather, I think that
> when the /usr/bin/telnet file was opened, I think it somehow had to
> wind up pointing at some UBC structure that had already been
> allocated to /etc/inetd.conf.
> 
> (This is consistent with the double-freeing theory.)
> 
> (3) I think it's actually fairly unlikely that anything had
> /etc/inetd.conf open; unless inetd takes a really long time to read
> config files, or someone else on the system decided to look at it
> (unlikely, though certainly possible), I don't know what would have
> had it open.  I just checked, and inetd closes its config files
> promptly after reading them.  It should certainly still have been in
> the buffer cache, though.

Are you sure /usr/bin/telnet was not already open ? Maybe you already had an
instance of it running somewhere ?


> 
> But yeah, it sure feels like a UBC page-allocation bug.

I agree that your observation doens't exactly match your theory, but the
double-free to list bug doesn't match mine either. Once the double-free occurs,
the duplicate entry would remain here until reboot, keeping on corrupting other
files. In my case I didn't reboot the server once the corruption was noticed.
I just restored the file from backup, and keep the server running for months
without problems further problems. The fact that the problem is transient
looks more like a dandling pointer.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--