Subject: Re: kern/27802: on disk full, last-edited file opened instead of binary
To: None <gnats-bugs@NetBSD.org>
From: Pavel Cahyna <pcah8322@artax.karlin.mff.cuni.cz>
List: netbsd-bugs
Date: 11/06/2004 22:41:19
Hello,

I have seen something similar. I accidentaly left on the debugging flag
for bind8 and the server's disk (all data including /var is on a single
cca 400 MB large filesystem) filled up by the debug log. I started vi to
examine the configuration file and suddently, vi coredumped. After this,
any attempts to run vi/ex resulted in an immediate coredump (with bus
error). (I believe bind was still running and trying to write the log, but
I do not remember for sure.) Then I deleted the log, but attempts to write
to the filesystem still reported "disk full". So I decided the filesystem
was corrupted and I rebooted. The kernel give up syncing buffers, and on
restart, fsck fixed some errors. Vi was still crashing and I discovered
that the file /usr/lib/libcurses.so.5.0 is corrupted. When I replaced it
from the distribution sets, vi become happy.

Now, to the corruption. The area in the file which differed from its original
started at byte 23590 and finished at byte 24575. It was 986 bytes long. 
The beginning is not aligned on some "magic" value (the end is -
24576 is 60000 in octal). First few lines of the differing area are:
.resend(addr=2 n=1) -> [198.32.64.12].53 ds=4 nsid=58129 id=0 5ms
evSetTimer(ctx 0x812b800, func 0x805eba8, uap 0x0, due 1055966785.000000000, int er 0.000000000)
resend(addr=3 n=1) -> [128.8.10.90].53 ds=4 nsid=58129 id=0 7ms

It is clear that this is the part of bind's debug log. Only the origin of the
first byte (which is 200 in octal) is mysterious.

It is interesting that vi started normally first time. This means that the
library became corrupted (probably in core) when vi was running, which caused it
to coredump. The cached data for the library were also marked as "dirty", which
caused them to be written back to disk. (I believe librairies are not normally
written to disk ...) At least, this is my analysis.

The server is EISA-based pentium 60, with about 24 megs of memory, running
netbsd 1.6_STABLE (from about December 2002).

Diff of ods of both the original and corrupted library can be provided.

bye	Pavel