Subject: many segmentation faults and file system damage?
To: None <current-users@netbsd.org>
From: Steven M. Bellovin <smb@cs.columbia.edu>
List: current-users
Date: 08/18/2007 12:17:33
Yesterday, I had an unusual and serious system crash.  I'm running
-current 4.99.29 from August 15 on an i386; the machine is normally
rock-solid.  When I rebooted, fsck had to truncate several files, more
than usual.  Some of them were critical and were almost certainly not
open for writing.  In particular, on rebooting I found
that /lib/libm.so.0.5 and /lib/libm387.so.0.1 were zero-length.
Fortunately, I had my build directory available, so I used /rescue/cp
to copy them into place.  I then reinstalled the whole system from that
build rather than risk further surprised.  (In retrospect, I should
have checked for other, critical zero-length files.)

I also found a number of core dumps in my home directory, representing
most of what I had open at the time: fvwm2, vi, xinit, xterm, etc.  All
showed that they'd died of segmentation faults.

After the crash, I had to reboot hard.  The machine was responding to
pings but nothing else.  At first, mouse movements worked; later, they
didn't seem to.  cntl+alt+Fn didn't do anything useful; trying
cntl+alt+ESC and blindly typing the reboot sequence didn't, either.
The CPU fan was running fairly fast, so I suspect that the machine was
looping rather than idle.

Has anyone seen anything like this?  It almost sounds like a hardware
glitch, but I'm by no means certain.


		--Steve Bellovin, http://www.cs.columbia.edu/~smb