Subject: Re: MP?
To: Havard Eidnes <he@netbsd.org>
From: Johnny Billquist <bqt@update.uu.se>
List: port-alpha
Date: 01/22/2004 18:20:53
On Thu, 22 Jan 2004, Havard Eidnes wrote:

> > >  o the kernel panic you get is in ltsleep(), and seems to indicate
> > >    that a sleep is done outside of a process context, i.e. curlwp is
> > >    NULL.  It would be interesting to see a stack backtrace to see
> > >    where this happens.  I'm not sure if this is actually related to
> > >    the machine running with multiple physical CPUs (but failed to
> > >    initialize the secondary CPUs).
> >
> > panic: spinlock_switchcheck: CPU 1 has 1 spin locks
> > Stopped in pid 5.1 (ioflush) at netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
>
> Hm, that's a different panic than the one you reported earlier,
> which was:
>
> panic: kernel diagnostic assertion "p != NULL" failed: file "/usr/src/sys/kern/kern_synch.c", line 413
> Stopped at      netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
> db{1}>

Oops. Sorry. Hmm. That was the result from a reboot, since I'm running the
machine to play with other things in between...

I'll see if I can recreate the assrt fail again.

Interesting thing to get different failures at the same point in the boot
sequence though.

> > db{1}> bt
> > cpu_Debugger() at netbsd:cpu_Debugger+0x4
> > panic() at netbsd:panic+0x1f8
> > spinlock_switchcheck() at netbsd:spinlock_switchcheck+0xa4
> > prologue botch: displacement 16
> > frame size botch: adjust register offsets?
> > mi_switch() at netbsd:mi_switch+0x58
> > mi_switch() at netbsd:mi_switch+0x58
> > db{1}>
> >
> > Not really pretty, I'd say.
>
> I agree.  Not sure how useful that is.
>
> I wonder, does it somehow think that the slave CPUs have started?  If
> so, there may be something wrong with the error handling in the case
> where they don't spin up.  ...and, indeed, cpu_boot_secondary() does
> not have a return value, so if something goes wrong there, the rest of
> the kernel is never told, and only the user is informed via the
> console output.
>
> The root problem, I suspect, is that your secondary CPUs don't spin
> up.  Could you try with just two identical CPUs in the chassis and see
> what happens?

Sadly I can't since I now don't have two identical CPUs.
CPU0 and CPU1 are both 21064A-2, but at different requencies.

	Johnny

Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt@update.uu.se           ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol