Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: hang on revivesa with dual core CPU



On Mon, Oct 06, 2008 at 12:35:33PM -0700, Bill Stouder-Studenmund wrote:
> On Thu, Oct 02, 2008 at 12:19:07PM +0100, David Brownlee wrote:
> >     I've been testing a current revivesa kernel (built from
> >     sources updated today) on several NetBSD 4.0_STABLE installs,
> >     and everything seems to be happy on the single core boxes
> >     (firefox3, gnash, pidgin, ktorrent3, apache, etc), but I've
> >     seen several hard lockups when running on an MP box (Thinkpad
> >     T60p Intel Core 2 T7400).
> 
> Have you tried this with the -current libpthread and update recepie?
> 
> A lot of stuff changed in -current, and the MP hangs could be unrelated to 
> SA specifically.

Or it could be some of the code not in kern_sa.c. If I can't reproduce it 
locally, we may need to do some 
printf debugging. You can also turn on SA kernel debugging and see if anything 
pops up before the hang.

So I think I've made some progress on what all is wrong with the UP case.

I commented earlier about how the problem seems to be a double-unblock. I now 
don't think that's it. 
Looking at the logs:

   476      2 firefox-bin SAU   unblocked, event=[<ctx=0xb87ffcf8, id=2, 
cpu=0>], intr=[<ctx=0xbfbfdcc0, id=4, cpu=0>]
   476      2 firefox-bin RET   sa_yield JUSTRETURN
   476      2 firefox-bin CALL  getcontext(0xb87ff918)
   476      2 firefox-bin RET   getcontext 0
   476      3 firefox-bin SAU   blocked, event=[<ctx=0xb87ffcf8, id=3, cpu=0>]
   476      4 firefox-bin SAU   blocked, event=[<ctx=0xb89ffcf8, id=4, cpu=0>]
   476      4 firefox-bin CALL  sa_yield
   476      4 firefox-bin SAU   unblocked, event=[<ctx=0xb89ffcf8, id=4, 
cpu=0>, <ctx=0xbfbfdc80, id=2, cpu=0>], intr=[<ctx=0xb87ff9b0, id=3, cpu=0>]

There was a double-block event. That's really not supposed to happen, but
it did.

I think the problem with the double-unblock originally was that the
contexts, at least as shown above, are in the wrong order. It looks like
the second event frame for the double unblock is the interrupted for the
unblocked way above, and the interrupted for the double unblock is the 
second blocked event.

Looking at the ktrace you sent me off-list, the problem there also was a 
double-block.

So the thing now is to figure out what's causing it and what can easily be 
done about it.

Take care,

Bill

Attachment: pgpQ2qWoXqWeP.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index