Subject: Re: reproducible kernel panic w/ 2.0RC4MP
To: Bill Studenmund <wrstuden@netbsd.org>
From: Tim Kelly <hockey@dialectronics.com>
List: port-macppc
Date: 11/11/2004 18:50:13
Hi Bill,

> How about telling us what the other CPU is doing? Ok, the one CPU is 
> waiting for the other one. So what is it doing?

I'm not sure. I'm basing my theory, and it is a theory, on the
incredible difference adding 128M of RAM to the existing 128M of RAM
made to the stability of -current MP. I looked at the code in cpu.c and
machdep.c and it appears to me that the kernel panic is forced after the
count/wait exceeds a certain level. Since the wait conditional is for
memory to be filled in asychronously, if either CPU thinks they have the
most current version of that memory in their (L1?) cache, the
miscommunication occurs. The CPU's cache reflects either no message
received or no response received. That's why my first attempt at a patch
involved ensuring a sync operation after each memory access. 

Now, it seems to me that this shouldn't be affected by the need to page
memory, so my hope had been to reproduce this on as many systems as
possible so that it could be determined if active memory requirements
exceeding physical memory present is consistent.

I'm fairly handy with Macsbug, the Motorola debugger for Macs, so if you
have some specific commands that can do bt's on the other CPU, please
pass them on. I can reproduce this kernel panic in less than an hour.
Also, so that I can identify potential avenues quicker, what _should_
the other CPU be doing?

tim