Subject: Re: reproducible kernel panic w/ 2.0RC4MP
To: Tim Kelly <hockey@dialectronics.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: port-macppc
Date: 12/10/2004 12:29:28
--xHFwDpU9dbj6ez1V
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 03, 2004 at 12:40:31PM -0500, Tim Kelly wrote:
> Ok, I've made significant progress on this and I have a patch that I
> believe will at least reduce the problem or fix it outright. I say this
> because I was able to identify the problem to the point that I could
> reproduce it at will, and I've also identified conclusively one of the two
> original premises as to why CPU0 was not acknowledging the IPI. It's PSL_=
EE
> was turned off and was not receiving external interrupts.

I'm coming into this thread late, I realize. Some of what I'm about to say=
=20
may already have been covered. Sorry.

First off, thank you for digging into this.

> After I went over the code extensively, and after not once seeing a lock
> panic (LOCKDEBUG) or a re-entrancy into do_pending_intr after setting the
> panic semaphore, I had to conclude the problem was not a lock or race and
> was elsewhere. Along the way, however, I changed some code in extintr.c to
> bring it in line with simple lock requirements, which is that the irq
> should not be enabled until after the simple lock is acquired and release=
d.
> For the non-openpic Macs, the original code was re-enabling the irq before
> acquiring the lock. I also added another layer of IPL to the MULTIPROCESS=
OR
> case, to also bring it in line with the kern_lock.c notes. This places an
> IPL_IPI in between IPL_HIGH and IPL_SERIAL. Putting IPL_IPI at the same
> level as IPI_SERIAL led to some problems with the serial console. Putting
> IPL_IPI above IPI_HIGH ensures that even during a simple lock any deferred
> IPI interrupts get processed, but nothing else will.

I agree with Pavel and think that we need to be careful about having
IPL_IPI above IPI_HIGH. Well, specifically above SPL_LOCK. And actually I
really think we shouldn't. I realize that you may have changed your mind
on this (the last patch I've seen is just turning interrupts on a bit
earlier) and thus the comment is just accademic. :-) I think it would be=20
difficult in the long run to ensure that IPIs never need to grab locks=20
(especially if they can on other platforms), and so we can get into a=20
mess.

>  #ifdef MULTIPROCESSOR
> +	tmsr =3D emsr;
>  	if (ci->ci_cpuid =3D=3D 0) {

I also don't like that line of the patch. We're supposed to support SMP,=20
which is _Symmetric_. Hard-coding cases for one CPU or another strikes me=
=20
as wrong. I realize we need to special-case startup, but other than that I=
=20
think we should be able to deal with things on any processor. I realize we=
=20
may route certain interrupts to certain CPUs, but we should look at things=
=20
in terms of "the CPU X is bound to" not a magic number.

Just my thoughts for now. I also freely accept that our SMP may need work=
=20
and thus we have to live with hard-coding for now. :-)

Take care,

Bill

--xHFwDpU9dbj6ez1V
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFBugcnWz+3JHUci9cRAordAJ9ZPPXpOKCEo1uhWBj5dqICloH4lgCghDvO
/4PZ8LWxIp+SnPFeBByGY2A=
=GK4y
-----END PGP SIGNATURE-----

--xHFwDpU9dbj6ez1V--