Subject: vmware suddenly crashing the system...
To: None <port-i386@netbsd.org>
From: Steve Bellovin <smb@research.att.com>
List: port-i386
Date: 07/31/2001 14:30:13
vmware has suddenly started hanging or crashing my machine.  I had been 
running 1.5.1b2; in a vain attempt to solve the problem, I upgraded 
this morning to the latest kernel in the 1.5 branch, which identifies 
itself is 1.5.2_ALPHA.

The symptom is that most user-level programs (including my window 
manager) go non-responsive when vmware is trying to sync its redo file 
(I use undoable virtual disks).  ssh from another host hung, too, but I 
could get a response from 'ntpq'.  Sometimes -- but not always -- I've 
seen TCP connection attempts from outside stay in SYN_SENT state (i.e., 
it never got an answer from an interrupt-level function on the hung 
machine), but ping succeeded.  I've *never* seen that happen.

At least once, the machine panicked, leaving behind the following 
mesage on reboot:

Jul 31 10:56:45 berkshire savecore: reboot after panic: uvm_pagedeactivate: caller did not check wire count
Jul 31 10:56:45 berkshire savecore: no dump, not enough free space in /var/crash

I moved /var/crash to /usr, but (of course) I haven't seen that failure 
since then.

The only substantive thing I changed between when vmware had been 
working and when it started failing is that I removed a 128M SIMM.  I 
had been getting sig11 during gcc compilations, which (according to ma
ny folks) indicate memory errors.  I've seen no such failures since 
removing the SIMM -- which still leaves me with 256M -- but that could 
mean that the bad spot is now somewhere in the kernel.  (The IBM 
diagnostics haven't found anything wrong...)

Any suggestions would be gratefully appreciated.

		--Steve Bellovin, http://www.research.att.com/~smb