Subject: Re: Isolating NMI/memory problem with old SPARCserver 20
To: None <port-sparc@NetBSD.org>
From: Greg Earle <earle@isolar.DynDNS.ORG>
List: port-sparc
Date: 08/06/2005 16:21:29
On Aug 6, 2005, at 2:26 PM, Greg Earle wrote:
> On Aug 6, 2005, at 1:25 PM, Havard Eidnes wrote:
>>> (Update: I just saw an old post to port-sparc from May 9th from
>>> Malte Dehling; he reported a similar error, but his log also
>>> shows a "module location: " identifier?  Mine doesn't - is this
>>> a new reporting feature in NetBSD 2.0 or something?)
>>
>> Yes.  The code was added in revision 1.8 of memecc.c on 22 Mar 2004.
>>
>> Index: memecc.c
>> ===================================================================
>> RCS file: /u/nb/src/sys/arch/sparc/sparc/memecc.c,v
>> retrieving revision 1.7
>> retrieving revision 1.8
>> diff -u -r1.7 -r1.8
>> --- memecc.c    15 Jul 2003 00:05:06 -0000      1.7
>> +++ memecc.c    22 Mar 2004 12:37:43 -0000      1.8
>> @@ -142,6 +142,8 @@
>>         printf("\tMBus transaction: %s\n",
>>                 bitmask_snprintf(efar0, ECC_AFR_BITS, bits, 
>> sizeof(bits)));
>>         printf("\taddress: 0x%x%x\n", efar0 & ECC_AFR_PAH, efar1);
>> +       printf("\tmodule location: %s\n",
>> +               prom_pa_location(efar1, efar0 & ECC_AFR_PAH));
>>
>>         /* Unlock registers and clear interrupt */
>>         bus_space_write_4(memecc_sc->sc_bt, bh, ECC_FSR_REG, efsr);
>>
>> However, that came with another set of changes, the
>> source-changes message was:
>>
>> Module Name:    src
>> Committed By:   pk
>> Date:           Mon Mar 22 12:37:43 UTC 2004
>>
>> Modified Files:
>>         src/sys/arch/sparc/include: promlib.h
>>         src/sys/arch/sparc/sparc: memecc.c memreg.c promlib.c
>>
>> Log Message:
>> Leverage the PROM's ability to identify the on-board location of a
>> physical memory address.
>>
>> To generate a diff of this commit:
>> cvs rdiff -r1.18 -r1.19 src/sys/arch/sparc/include/promlib.h
>> cvs rdiff -r1.7 -r1.8 src/sys/arch/sparc/sparc/memecc.c
>> cvs rdiff -r1.37 -r1.38 src/sys/arch/sparc/sparc/memreg.c
>> cvs rdiff -r1.31 -r1.32 src/sys/arch/sparc/sparc/promlib.c
>>
>> For a quick try, you could perhaps try to add those changes to
>> your local source tree and run that kernel?  (It's not a given
>> that this doesn't depend on some other change, but it's worth a
>> try.)  That is, if your machine stays up long enough for you to
>> patch and compile a new kernel...
>
> It stays up; these aren't fatal.  And they're only occasional.
>
> More of a problem is the fact that my versions of these 4 files
> are ancient compared to the ones you've mentioned:
>
> ==> src/sys/arch/sparc/include/promlib.h <==
> /*      $NetBSD: promlib.h,v 1.4 2001/09/26 20:53:07 eeh Exp $ */
>
> ==> src/sys/arch/sparc/sparc/memecc.c <==
> /*      $NetBSD: memecc.c,v 1.3 2002/03/11 16:27:04 pk Exp $    */
>
> ==> src/sys/arch/sparc/sparc/memreg.c <==
> /*      $NetBSD: memreg.c,v 1.32 2002/03/11 16:27:04 pk Exp $ */
>
> ==> src/sys/arch/sparc/sparc/promlib.c <==
> /*      $NetBSD: promlib.c,v 1.13 2001/12/07 11:00:39 hannken Exp $ */
>
> So I'm a bit afraid that these 4 diffs won't just drop right in ...
> (I suppose I can try it and see, though)

Answered my own question - it looks like my ancient promlib.h
and the rest are missing apparently vital functions.  And there
are some other changes I'm worried about - the change from
"KERNEL_PROC_UNLOCK(curproc)" to "KERNEL_PROC_UNLOCK(curllwp)",
for example.  Do I risk patching, or is it likely to fail?

And I just noticed something else - the patch to "memreg.c" is
in the routine memerr4_4c(); the very last line of this routine
is a "panic("memory error");" - but my machine isn't panic'ing.
So I assume I don't really need that particular file's fixes ...
just the one for memecc.c::memecc_error().  (That said, there's
another change to that file that adds a CFATTACH_DECL(); I
can't seem to find that in my source tree ... is it safe to
ignore that particular change?  It was introduced in memecc.c
version 1.5).

	- Greg

P.S. Timo, the Ultra 60 runs Solaris 9.  I need to match my
      work environment for one; SMP support in NetBSD/SPARC64
      for another; and last but not least, I need SunPCi II
      card support ...