Subject: many segmentation faults and file system damage?
To: None <current-users@netbsd.org>
From: Steven M. Bellovin <smb@cs.columbia.edu>
List: current-users
Date: 08/18/2007 12:17:33
Yesterday, I had an unusual and serious system crash. I'm running
-current 4.99.29 from August 15 on an i386; the machine is normally
rock-solid. When I rebooted, fsck had to truncate several files, more
than usual. Some of them were critical and were almost certainly not
open for writing. In particular, on rebooting I found
that /lib/libm.so.0.5 and /lib/libm387.so.0.1 were zero-length.
Fortunately, I had my build directory available, so I used /rescue/cp
to copy them into place. I then reinstalled the whole system from that
build rather than risk further surprised. (In retrospect, I should
have checked for other, critical zero-length files.)
I also found a number of core dumps in my home directory, representing
most of what I had open at the time: fvwm2, vi, xinit, xterm, etc. All
showed that they'd died of segmentation faults.
After the crash, I had to reboot hard. The machine was responding to
pings but nothing else. At first, mouse movements worked; later, they
didn't seem to. cntl+alt+Fn didn't do anything useful; trying
cntl+alt+ESC and blindly typing the reboot sequence didn't, either.
The CPU fan was running fairly fast, so I suspect that the machine was
looping rather than idle.
Has anyone seen anything like this? It almost sounds like a hardware
glitch, but I'm by no means certain.
--Steve Bellovin, http://www.cs.columbia.edu/~smb