Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Some alpha problems in current

On Sun, 3 Feb 2008, Michael L. Hitch wrote:

I'm now trying a kernel where I've added a SPINLOCK_SPIN_HOOK in the places where sys/kern/mutex.c spins, and my MP kernel has been running for over 2 hours. I'm going to try a LOCKDEBUG kernel again after a while
to see if that's changed by the addition of the SPINLOCK_SPIN_HOOKs.

  The machine survived 16 hours (until 04:55), at which point it paniced
with "fpsave ipi didn't".  CPU 1 was hung and I couldn't get a backtrace
from it.  This would also explain the panic - CPU 0 is waiting for the
other CPU to process the fpusave ipi, but it's probably spinning somewhere
and unable to process the ipi. This type of panic used to be fairly frequent before, but I was able to track down the deadlock at the time and
that particular problem was fixed.  It looks like it may be back again.

  The backtrace from CPU 0 doesn't seem to show much:

db{0}> t
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x1c8
fpusave_proc() at netbsd:fpusave_proc+0x1b4
alpha_enable_fp() at netbsd:alpha_enable_fp+0x74
trap() at netbsd:trap+0x8a4
XentIF() at netbsd:XentIF+0x20
--- instruction fault (from ipl 0) ---
--- user mode ---
db{0}> mach cpu 1
CPU 1 not paused
db{0}> c
syncing disks... Mutex error: mutex_vector_exit: exiting unheld spin mutex

lock address : 0xfffffc0000b99ec0
current cpu  :                  1
current lwp  : 0xfffffc006e05d8c0
ex0: uplistptr was 0
owner field  : 0x0000000000000500 wait/spin:                0/1

panic: lock error
Stopped in pid 9592.1 (rateup) at netbsd:cpu_Debugger+0x4: ret z
db{1}> t
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x1c8
logputchar() at netbsd:logputchar
--- root of call graph ---
db{1}> c

dumping to dev 8,17 offset 131071

  I haven't looked at the lock error panic that occurred when I tried to
continue from the fpsave ipi panic yet.  It may give a clue as to what
CPU 1 was doing, but the backtrace from that doesn't look very helpful.

  I got a dump from that, but need to update savecore and libkvm so I can
get the dump file.

Michael L. Hitch              

Home | Main Index | Thread Index | Old Index