Subject: Re: possible new "simple_lock: locking against myself" bug on dual-CPU AS4000
To: NetBSD port-alpha List <port-alpha@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 10/24/2005 11:25:54
At Fri, 21 Oct 2005 18:43:53 -0700,
Chuck Silvers wrote:
> 
> On Mon, Oct 17, 2005 at 07:32:07PM -0400, Greg A. Woods wrote:
> > db{1}> trace
> > cpu_Debugger() at cpu_Debugger+0x4
> > _simple_lock() at _simple_lock+0x140
> > pmap_do_tlb_shootdown() at pmap_do_tlb_shootdown+0x90
> > alpha_ipi_process() at alpha_ipi_process+0xc4
> > interrupt() at interrupt+0x90
> > XentInt() at XentInt+0x1c
> > --- interrupt (from ipl 5) ---
> > _simple_lock() at _simple_lock+0x358
> > pmap_do_tlb_shootdown() at pmap_do_tlb_shootdown+0x90
> > alpha_ipi_process() at alpha_ipi_process+0xc4
> > interrupt() at interrupt+0x90
> > XentInt() at XentInt+0x1c
> > --- interrupt (from ipl 0) ---
> > _lockmgr() at _lockmgr+0x1018
> > _kernel_proc_lock() at _kernel_proc_lock+0x6c
> > syscall_plain() at syscall_plain+0x38
> > XentSys() at XentSys+0x5c
> > --- syscall (198) ---
> > --- user mode ---
> > db{1}> 
> 
> a second IPI interrupt is being delivered while one is already being
> processed.  it seems unlikely that this is a general bug in the alpha
> interrupt code, or everyone would be seeing all kinds of crashes.
> 
> did this machine have the latest firmware installed?  buggy firmware
> seems the most likely non-hardware cause of this kind of thing.

That machine was indeed running rather old firmware:

	AlphaServer 4000 Console V5.0-2, 25-SEP-1997 12:42:27

The new AS4100 is running V6.1-1 now. (but it's getting another new
panic that I'll post about shortly...)

-- 
						Greg A. Woods

H:+1 416 218-0098  W:+1 416 489-5852 x122  VE3TCP  RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>