port-i386: Re: 3.1_STABLE and SMP

Subject: Re: 3.1_STABLE and SMP
To: Stephen Borrill <netbsd@precedence.co.uk>
From: Andrew Doran <ad@netbsd.org>
List: port-i386
Date: 04/28/2007 12:17:15

On Fri, Apr 27, 2007 at 02:29:02PM +0100, Stephen Borrill wrote:

> On Fri, 20 Apr 2007, Andrew Doran wrote:
> >On Fri, Apr 20, 2007 at 12:46:53PM +0100, Stephen Borrill wrote:
> >
> >>>Do you have a DDB stack backtrace, assuming you can get into DDB from
> >>>the hung state?
> >>
> >>I got emailed a screenshot of one:
> >>http://projects.precedence.co.uk/netbsd/ddb1.jpg
> >
> >I have seen reports of something similar. In addition to what Greg
> >mentioned, it's possible that:
> >
> >- this CPU holds the kernel lock and is spin waiting on the pmap lock
> >- another cpu holds the pmap lock, has taken an interrupt, and is spin
> > waiting on the kernel lock
> >
> >That should not happen unless there is a bug somewhere. How many CPUs does
> >the machine have? If it happens again, could you ask the custy to do a:
> >
> >mach cpu 0
> >tr
> >mach cpu 1
> >tr
> >mach cpu ...
> >tr
> >
> >A dump of the held simplelocks would be good to get too
> 
> And to follow up my last mail which linked to:
> 
> http://projects.precedence.co.uk/netbsd/ddb2.jpg
> 
> I had another crash (handwritten backtrace):
> 
> CPU 0:
> _kernel_lock
> intr_biglock_wrapper
> Xintr_ioapic_level9
> --- interrupt ---
> Xspllower
> _kernel_lock
> syscall_plain 202
> 
> CPU 1:
> _simple_lock
> pmap_destroy
> uvmspace_free
> exit1
> sys_execve
> syscall_plain 59
> 
> In common with the first panic, this seems to be on process exit in 
> pmap_destroy.
> 
> Hope this helps,

Hrm. I don't think there is a lock ordering problem now - I think that
something is forgetting to release the pmap lock. Is this system a Pentium
Pro by any chance?

Andrew