port-vax: Re: VAXstation 4000/90 Success!

Subject: Re: VAXstation 4000/90 Success!
To: Michael L. Hitch <mhitch@lightning.msu.montana.edu>
From: Hugh Graham <hugh@openbsd.org>
List: port-vax
Date: 07/20/2002 01:33:30
On Fri, Jul 19, 2002 at 11:23:30PM -0600, Michael L. Hitch wrote:
> 
>   I checked on my 4000/90, and the flash memory is 4 PLCC chips soldered
> to the board.  Unless a DEC utility to reprogram the flash can be found,
> somebody's going to have to be very brave and take the risk of losing the
> entire flash testing a new program.
> 
>   I was looking at the documentation for the AMD Am28F010 flash memory
> (it's an older 128Kb flash memory), and if that's anything similar to what
> the 4000/90 is using, the entire memory has to be erased and re-written to
> correct the corruption.

Nothing ever seems to be easy. You did inspire me to open up my 4k60
and find AM27C1024s (EPROM, not flashable) in socketed DIPs. Too bad
DEC decided to save a few bucks and go for directly soldered PLCC for
the 4k90's FEPROMS.

>   I was trying to be very careful not to boot an unpatched 1.5.x kernel,
> but goofed once and clobbered the prom on mine.  It's a 1.4 version, and
> it shows only the one byte zeroed in the same location as the other
> corrupted image you have.

At least this means there are no other problems lurking.

> 
>   Also, in looking at the dz probe routine and comparing it with the AMD
> Am28F010 programming, it matches up with programming the one byte that is
> modified.  The probe does a write of 0x4020 to the dz csr (0x200a0000),
> followed by a write of 0x0001 to the dz tcr (0x200a0008).  Based on the
> Am28F010 information, the 0x40 byte in the first write is the Program
> Setup code, and the second write with 0x0001 would program the 0x00 byte
> into the byte at location 0x200a0009.  Normally there's a 10 microsecond
> delay after the program write command, follwed by a program verify command
> to stop the programming operation.  I'm not sure what happens if that
> program verify command doesn't get issued.
> 
Thank you, this explains how the dz probe (working on 16 bit words)
managed to zero only 8 bits of memory at the nominal tcr.

>   When I had the machine apart, I tried to take a look at what the flash
> memory was.  They all had stickers marked 1.4, which I was able to pull
> up, but underneath them was another sticker with some other number printed
> (which I would guess is probably a DEC part number).  That sticker looked
> like it was more securely attached to the chip, so I didn't try pulling it
> up.

The way the observed effects and operation of the dz probe correspond
with your description of how this part is programmed makes a pretty
convincing case for Am28F010. It seems to have been bad luck that the
chip got stimulated in just the right way to warm up its electron
injector and zap a few bits.

> 
>   So it appears to me like we might be able to write a program that could
> possibly reprogram the flash, but it could be a little risky.  I don't
> know of any way to recover from erasing the prom and failing to program it
> afterwords.

It might be possible to take multiple stabs at it if one worked from
multiuser mode via a kernel module, checking results, modifying the
module if necessary and repeating, and assuming the kernel didn't need
any facility located within the firmware to continue operation, and
that one managed to do it without crashes or power failures.

Saying all that makes finding DEC's programmer sound a whole lot safer.
Maybe one of the DECUS derivatives can scare up the necessary resources
if asked nicely.
> 
>   Hmm, if I could figure out how the checksum works, perhaps I could
> modify other data bytes to make the checksum come out correct.  That might
> at least stop the errors.

Hmm, since you have more information on how this chip may work, maybe
you can try some new combinations of the dz misprobe. When I had Chuck
Cranor boot my attempts at setting things right, I never had the 0x40
write setup in there. Perhaps this part can restore some bits without
requiring a full erase first after all.

If not, tracking down the firmware's checksum routine may be useful,
in case it has a special case for an all-zeros checksum, or can be
zapped to always return success.

/Hugh

> 
> --
> Michael L. Hitch			mhitch@montana.edu
> Computer Consultant
> Information Technology Center
> Montana State University	Bozeman, MT	USA
>