Subject: Re: reproducible kernel panic w/ 2.0RC4MP
To: Bill Studenmund <wrstuden@netbsd.org>
From: Tim Kelly <hockey@dialectronics.com>
List: port-macppc
Date: 11/11/2004 20:45:25
Part of my theory comes from reading from the MPC604EUM (PowerPC 604e
User's Manual Supplement):

3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
sequentially consistent and memory accesses among processors can occur
in any order. The ability to order memory accesses weakly provides
opportunities for more efficient use of the system bus. Unless a
dependency exists, the 604e allows read operations to precede store
operations. 

Note that strong ordering of memory accesses with respect to
the bus (and therefore, as observed by other processors and other bus
participants) can be accomplished by following instructions that access
memory with the SYNC

3.6.6 Coherency Paradoxes in Multiple-Processor Systems
It is possible to create a coherency paradox across multiple processors.
Such paradoxes are particularly difficult to handle since some scenarios
could result in the purging of modified data, and others may lead to
unforeseen bus deadlocks. 

Most of these paradoxes center around the interprocessor coherency of
the memory coherency bit (or the M bit). Improper use of this bit can
lead to multiple processors accepting a cache block into their caches
and marking the data as exclusive. In turn, this can lead to a state
where the same cache block is modified in multiple processor caches.

(end)

The directions to follow memory accesses with sync is why I tried to
apply an override to pio_subr.S. I couldn't tell if DGBSYNC was
absolutely declared, so as a quick and dirty hack I just added a #ifdef
MULTIPROCESSOR to make DBGSYNC be sync. The odd thing for me is that it
seems more logical to do a sync _before_ a read, while the docs say
after.

While I'm sure that coherency paradoxes are well known by you and Matt
Thomas, it just seems to me that this could be an explanation of what is
happening, especially when it sort of (Dave Huang notwithstanding) seems
to involve possible paging out of or into memory.

tim