Subject: Re: Recent macppc kernels hang under load
To: None <port-macppc@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: port-macppc
Date: 09/14/2003 16:00:55
hi,

On Sat, Sep 13, 2003 at 01:44:14PM -0500, Dave Huang wrote:
> On Sun, Aug 31, 2003 at 03:06:12AM -0500, Dave Huang wrote:
> > On Sat, Aug 30, 2003 at 03:58:13PM -0700, Chuck Silvers wrote:
> > > hi,
> > > 
> > > I take it that this used to work before 3 or 4 weeks ago...
> > > if so, you could binary search for the check-in that broke it.
> > 
> > Well, I'd get random SIGILLs, but the kernel never died... I did the
> > binary search, and I guess I was misremembering when I said I first
> > noticed the problem 3 or 4 weeks ago. Looks like it started around Aug
> > 12... perhaps it was Matt Thomas's "cleanup/rework cpu_switch*,
> > switch_exit, Idle routine" in sys/arch/powerpc/powerpc?
> 
> I don't mean to be _overly_ pushy (just slightly pushy :), but does
> anyone have any ideas about this? Unfortunately, the bug is probably
> too deep in the powerpc locore stuff for me to debug myself.

I've been meaning to look into it, but I've been really busy lately.
I started working on it yesterday, and I got my dual g4 to crash once
after 4 hours.  it was a different symptom that you reported though,
this one was some kind of recursive DSI trap during the idle loop:

panic: lockmgr: no context
Stopped at      netbsd:cpu_Debugger+0x10:       lwz     r0, r1, 0x14
db{0}> t
0x00330dd0: at panic+18c
0x00330e90: at lockmgr+f4
0x00330ec0: at uvm_fault+e0
0x00330ff0: at trap+2c4
0x00331020: kernel DSI read trap @ 0 by ofb_console_dc+11a7f0d0: srr1=0xd567a000
            r1=0x3310e0 cr=0x9032 xer=0 ctr=0x1a855c dsisr=0x1
0x003310e0: at trap+22c
0x00331110: kernel DSI read trap @ 0xd567a018 by trap+230: srr1=0x9032
            r1=0x3311d0 cr=0x20009032 xer=0 ctr=0x1a855c dsisr=0x40000000
0x003311d0: at trap+22c
0x00331200: kernel DSI read trap @ 0xd567a018 by trap+230: srr1=0x9032
            r1=0x3312c0 cr=0x20009032 xer=0 ctr=0x1a855c dsisr=0x40000000
0x003312c0: at trap+22c
0x003312f0: kernel DSI read trap @ 0xd567a018 by trap+230: srr1=0x9032
            r1=0x3313b0 cr=0x20009032 xer=0 ctr=0x1a855c dsisr=0x40000000
0x003313b0: at trap+22c
0x003313e0: kernel DSI read trap @ 0xd567a018 by trap+230: srr1=0x9032
            r1=0x3314a0 cr=0x20009032 xer=0 ctr=0x1a855c dsisr=0x40000000
0x003314a0: at trap+22c

...

0x00335df0: kernel DSI read trap @ 0xd567a018 by trap+230: srr1=0x9032
            r1=0x335eb0 cr=0x20000030 xer=0 ctr=0x1a855c dsisr=0x40000000
0x00335eb0: at trap+22c
0x00335ee0: kernel DSI read trap @ 0xd567a000 by trapstart+900: srr1=0x30
            r1=0x335fa0 cr=0x40209034 xer=0 ctr=0 dsisr=0x40000000
0x00335fa0: at trapstart+8a0
0x00339ff0: at Idle+18
db{0}>    



that's all I've had time for so far.

have you had any problems with a non-MP kernel?

-Chuck