Subject: Re: The pv_unlink0 saga continues
To: None <erik@mediator.uni-c.dk, port-sparc@netbsd.org>
From: Chris Torek <torek@BSDI.COM>
List: port-sparc
Date: 01/17/1999 13:11:47
>... Then I made the more radical changed shown here:

  [snipped out, but note that it inserts nops just before the
   call to panic("stack overflow") -- i.e., if this code is ever
   reached at all, the system is about to halt]

>With this patch, the machine has [stayed up] ...
>possible feedback about why it happens. 

Since those no-ops are in code that is not executed, the answer to
"why it now stays up" must lie in some other behavior, such as cache
line positions.  I note also the comment:

>+       nop; nop; nop; /* SS 20 panic? */ \

The SS20 uses the TI "Viking" CPU chip.  (Some models use a "Voyager"
instead; I think the following applies only to the Viking.)  This
chip has a level-1 D-cache that can participate in I/O.  When the
chip is in write-through mode (as on any dual-processor SS20, or
any machine with the MXCC and Ecache) all is okay, but when it is
write-back mode (as on a single processor box with no Ecache), the
D-cache must interact with I/O transactions.  Early versions of
the Viking have bugs in this hardware.  (They have a bunch of other
bugs too, but this is the nastiest I know about, as it winds up
corrupting I/O transactions and possibly putting bad data into the
cache.)

You can identify the CPU chip by the iu_impl and iu_version fields
of the PSR combined with the mmu_impl and mmu_version fields of the
PCR.  The table I have (which I believe is quite incomplete) is:

/* table order matters */
struct cpu4m_subtype {
	u_char  iu_impl;
	u_char  iu_vers;
	u_char  mmuimpl;
	u_char  mmuvers;
	 
	initfn  *init;
	char    *verstr;
	int     iarg;  
};      

struct cpu4m_subtype cpu4m_subtypes[] = {
    /* iuimpl iuver mmui mmuver */

	/*
	 * Ross HyperSPARC
	 */
	{ 0x1, 0xf, 0x1, ANY,   cpu4m_hs_init,  "RT620", 620 },
	{ 0x1, 0xe, 0x1, ANY,   cpu4m_hs_init,  "RT626", 626 },
	{ 0x1, ANY, 0x1, ANY,   cpu4m_hs_init,  NULL, 0 },

	/*
	 * SuperSPARC (Viking / Voyager).
	 *
	 * I do not really know if I need the iu_vers==0-or-1 for each
	 * of the SuperSPARC family here; I am just duplicating in the
	 * table what used to be done by the C code.
	 */
	{ 0x4, 0x0, 0x0, 0x0,   cpu4m_ss_init,  "1020N / Viking 1.2", 1 },
	{ 0x4, 0x1, 0x0, 0x0,   cpu4m_ss_init,  "1020N / Viking 2.x", 1 },
	{ 0x4, 0x0, 0x0, 0x1,   cpu4m_ss_init,  "Viking 3.x", 2 /*???*/ },
	{ 0x4, 0x1, 0x0, 0x1,   cpu4m_ss_init,  "Viking 3.x", 2 /*???*/ },
	{ 0x4, 0x0, 0x0, 0x4,   cpu4m_ss_init,  "Viking 3.5", 2 /*???*/ },
	{ 0x4, 0x1, 0x0, 0x4,   cpu4m_ss_init,  "Viking 3.5", 2 /*???*/ },
	{ 0x4, 0x0, 0x0, 0x3,   cpu4m_ss_init,  "1020 / Viking 5.x", 1 },
	{ 0x4, 0x1, 0x0, 0x3,   cpu4m_ss_init,  "1020 / Viking 5.x", 1 },
	{ 0x4, 0x0, 0x0, 0x8,   cpu4m_ss_init,  "1021 / Voyager 1.x", 2 },
	{ 0x4, 0x1, 0x0, 0x8,   cpu4m_ss_init,  "1021 / Voyager 1.x", 2 },
	{ 0x4, 0x0, 0x0, 0x9,   cpu4m_ss_init,  "1021 / Voyager 2.x", 2 },
	{ 0x4, 0x1, 0x0, 0x9,   cpu4m_ss_init,  "1021 / Voyager 2.x", 2 },
	{ 0x4, 0x0, 0x0, 0xc,   cpu4m_ss_init,  "Voyager", 2 }, /*???*/

	/* 
	 * MicroSPARCs -- need more types?
	 */
	{ 0x0, 0x4, 0x0, ANY,   cpu4m_ms_init,  "MicroSPARC-II", 2 },
	{ 0x0, 0x4, 0x4, ANY,   cpu4m_ms_init,  "MicroSPARC-I", 1 },
	{ 0x0, 0x4, ANY, ANY,   cpu4m_ms_init,  NULL, 0 },

	{ ANY, ANY, ANY, ANY,   cpu4m_unknown,  NULL, 0 } /* must be last */
};

(My "initfn" gets called with the cpu-data, cache-info, PROM node, and 
the integer argument "iarg", which lets one function work for a family
of CPUs.  It needs to insert whatever bug-workarounds are required,
and/or set flags in the cpu-data structure so that other routines like
the pmap code can do their own bug-workarounds.)

Actual workarounds for the various processor bugs can be found (with
great difficulty :-) ) in the Linux code -- this is where I got a lot
of what ended up in the above table.

Chris