port-i386: Re: 3.1_STABLE and SMP

Subject: Re: 3.1_STABLE and SMP
To: Andrew Doran <ad@netbsd.org>
From: Stephen Borrill <netbsd@precedence.co.uk>
List: port-i386
Date: 04/27/2007 14:29:02

On Fri, 20 Apr 2007, Andrew Doran wrote:
> On Fri, Apr 20, 2007 at 12:46:53PM +0100, Stephen Borrill wrote:
>
>>> Do you have a DDB stack backtrace, assuming you can get into DDB from
>>> the hung state?
>>
>> I got emailed a screenshot of one:
>> http://projects.precedence.co.uk/netbsd/ddb1.jpg
>
> I have seen reports of something similar. In addition to what Greg
> mentioned, it's possible that:
>
> - this CPU holds the kernel lock and is spin waiting on the pmap lock
> - another cpu holds the pmap lock, has taken an interrupt, and is spin
>  waiting on the kernel lock
>
> That should not happen unless there is a bug somewhere. How many CPUs does
> the machine have? If it happens again, could you ask the custy to do a:
>
> mach cpu 0
> tr
> mach cpu 1
> tr
> mach cpu ...
> tr
>
> A dump of the held simplelocks would be good to get too

And to follow up my last mail which linked to:

http://projects.precedence.co.uk/netbsd/ddb2.jpg

I had another crash (handwritten backtrace):

CPU 0:
_kernel_lock
intr_biglock_wrapper
Xintr_ioapic_level9
--- interrupt ---
Xspllower
_kernel_lock
syscall_plain 202

CPU 1:
_simple_lock
pmap_destroy
uvmspace_free
exit1
sys_execve
syscall_plain 59

In common with the first panic, this seems to be on process exit in 
pmap_destroy.

Hope this helps,

-- 
Stephen