Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange system behavior



On Mon, 20 Sep 2010, Paul Goyette wrote:

> About half of the time, the build fails due to some host utility receiving a
> "segmentation fault", and almost always it fails on the exact same command and
> at the exact same place in the build!  But re-running the failed command
> interactively succeeds without any problem.

A "segmenation fault" is a SIG_SEGV.  That's generated by the kernel when 
a process attempts to access memory outside it's address space.  But it 
can also be sent by a number of other conditions including another process 
calling alarm().  You need to deermine the source of the signal, something 
that's quite difficult to do.  You can try putting breakpoints in 
uvm_fault() where it sends a signal to the process or turning on UVM 
debugging to determine if it really is a memory error.  You can try 
putting a breakpoint in the signal delivery code and look at the 
kernel backtrace, but getting it to trigger on just the signal you want is 
extremely difficult.  You can try ktrace-ing the enitire build to see 
signal delivery.  

Diagnosing this will be painful.  It's not clearly a hardware problem.  It 
could be a memory problem, cache coherency problem, a problem with the 
thread context.

Eduardo


Home | Main Index | Thread Index | Old Index