Subject: Re: VAX 6420: KA64A EEPROM and serial number mismatch
To: Gunther Schadow <port-vax@netbsd.org>
From: Geoff Roberts <geoffrob@stmarks.pp.catholic.edu.au>
List: port-vax
Date: 05/28/2001 12:01:15
----- Original Message -----
From: "Gunther Schadow" <gunther@aurora.regenstrief.org>
To: <port-vax@netbsd.org>
Sent: Sunday, May 27, 2001 3:48 PM
Subject: VAX 6420: KA64A EEPROM and serial number mismatch


> Hi,
>
> I put the other machine online and that turned out to be a major
> effort. Since I still have my power cord screwed directly to my
> circuit breaker panel, I figured it would be safer and quicker to
> swap the power input box between the two machines. Did that
> and power came right up. However, the blowers wouldn't move, not
> one of them! Heck, since both blowers were stuck I thought it
> had to be a defect upstream. So I swapped the power and logic
> box too. Still no blower would move. So I screwed out the blowers,
> which is a major piece in VAX mechanics (I am a pro now in
> *screwing* with VAX 6000s :-) And indeed, the other pair of
> blowers worked fine.
>
> Question: can one repair blowers?

Never had one fail.  Probably siezed due to water penetration.  Can you
turn them by hand?  If so, then you need to check psu issues.

> I am just lucky that they didn't dispose of the empty corpse
> yet at my work place. So I'll go get the blowers out of there
> too. But it's annoying to think that your VAX goes down because
> the blowers fail! (And it does go down, because the heat
> and airflow sensors will shut down power if the blowers don't
> work.)

It has too go down, or it will overheat and cook.  All machines of this
calibre had such features.  Even pcs are getting this way, with the bios
monitoring the health of the cpu fan and cpu temp and shutting it down
if it gets too hot.

> Then I started swapping cards around. I found out that the bus
> backplanes are a little picky about getting good contact to
> the cards. Hard to say whether a card is not working well or
> the slot in which it sits isn't quite holding it right.

You will develop a feel for this eventually, but inserting them a few
times
if they have been there for a while will help, cleaning the contact face
with
isopropyl may help too.

> I wanted to build a 6 processor machine, but the problem I
> ran into is that the processors are quite picky about with
> whom they want to play. Even a variation in minor EEPROM
> revisions 2.03/4.00 vs. 2.03/4.02 and most of all the
> serial number mismatch would not allow me to do the swapping.

Well, yes and no, it will bitch about the mismatch, but it will still
boot ok.
Ours here has been like that for years, since it's cpus came from two
different
machines, but it still works and does smp on all processors quite ok.

> Question: how can one update EEPROM revisions and the
> "branding" of the serial numbers on the KA64A processor?

Details in the manual, you will need a TK70 cartridge and some patience,
this will let you put one S/N onto all the cpus.

> I figure if people trade the processor boards in used parts
> dealerships, buying those would never be useful because
> you can't mix and match them! But there has to be a trick
> other than to "call your field service representative",
> I hope!!

RTFM on the 6000-400.  Section 5.16 on SAVE EEPROM and
Section 5.23 on UPDATE.  Basically you save the latest version
EEPROM data (whichever cpu that is) and it's s/n to TK70 then
UPDATE the EEPROM's on the others with that info, presto all
have the same EEPROM data.

> I also found the matching of memory boards tricky. In
> particular the selftest would not only indicate an
> error when something was wrong with memory, but it will
> throw an exception and enter a really weird state where
> the whole machine can rightout hang requiring reset
> or power cycling. What's the magic with the memory?
> Can one use SET MEMORY/INTERLEAVE ... to make it take
> the memory it gets? I really would like to try maxing
> out one machine with all the spare boards I have, but
> it doesn't let me do that.
>
> Here is the memory error I get:
>
> #123456789 0123456789 0123456789 01234567#
>
> F   E   D   C   B   A   9   8   7   6   5   4   3   2   1   0   NODE #
>     A   A   .   M   M   M   M   M   M   .   P   P   P   P       TYP
>     o   o   .   +   +   +   +   +   +   .   +   +   +   +       STF
>     .   .   .   .   .   .   .   .   .   .   E   E   E   B       BPD
>     .   .   .   .   .   .   .   .   .   .   +   +   +   +       ETF
>     .   .   .   .   .   .   .   .   .   .   E   E   E   B       BPD
>
> .   .   .   .   .   .   .   .   .   +   +   .   -   +   +   .   XBI D
+
> .   .   .   .   .   .   .   .   .   +   .   +   .   .   +   .   XBI E
+
> Machine Check Stack Frame:
> 80000011
> 00020014
> 2004A618
> 00000000
> 061AF00E
> 0000001F
> 2004A602
> 041F0009
>
> PCSTS:          000008C8
> PCERR:          3C0020F0
> PCTAG:          40000000
> BCSTS:          01800000
> BCERR:          2004DE90
> BCBTS:          2000003C
> BCP1TS:         2001F804
> BCP2TS:         2001F804
> IPORT:          800003F1
> XBER:           90021041
> XFADR:          C0020000
> RCSR:           09200011
>
> ?29 Machine check accessing memory 80000011 00020014 2004A602
> Console halting after unexpected machine check or exception.
> ?06 Halt instruction executed in kernel mode.
>     PC     = 200C0E68
>     SAVPSL = 041F0600
>     ISP    = 20140408
>
>
> Any ideas where this comes from? The thing is, I *know* that the
> boards are good and that the bus slots are good, since a different
> configuration involving those boards and slots will work. But
> my memory is of two different types (128 MB and 32 MB).

Limits in size.  4 x 128 is 512, that's the hard limit for a 6K.

> Adding 2 or 3 x 32 MB to 4 x 128 MB has never worked; why?

It just can't address that much.

Cheers

Geoff in Oz