Subject: re: 1.6 woes (pmap vs. UBC?)
To: NetBSD/sparc Discussion List <port-sparc@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-sparc
Date: 08/08/2002 14:44:46
[ On Thursday, August 8, 2002 at 10:20:25 (-0700), Brian Buhrow wrote: ]
> Subject: re: 1.6 woes (pmap vs. UBC?)
>
> 	I used to work in a shop with a fleet of 40 Sparc 1, 2, SLC, ELC, IPC 
> and IPX machines.  We had about 10 Sparc 2 machines running at once in the
> same room, all running identical software images.  A couple of the Sparc 2
> machines would often have programs that would randomly drop core,
> spontaneously reboot, or just act strangely.  One of these machines refused
> to run Solaris when it had mor than 48MB of memory installed.  SunOS4 ran
> fine, no problem.  When we sent the machine back to Sun for investigation,
> they told us that they couldn't really diagnose the problem because our
> machine was a revision 2 model, and the oldest they had in their test lab
> was a revision 6 model.  

Very interesting.

I wonder how the model revision is identified?

I have a spare SS1 board (501162900716) with a "17rev50" sticker.  It
has a little daughter board over by the CPU (between the two serial
ports, sitting on top of J1's pins).  I have another spare (501138213975)
that's labeled "12rev50", and without this "patch board".  I have a
third in a machine downstairs which IIRC has yet another rev sticker.

I've only got one SS2 at the moment, and I don't know what board it has
in it for sure.

My assumption in the past has been that most of the revisioning was due
to problems in peripheral devices, or perhaps to deal with having to use
different chips as suppliers changed or whatever.

> 	What I took away from that was that the Sparc 1 and 2 machines
> went through a lot of revisioning, silently from the customer's
> perspective, and given many customers habbits of buying machines piecemeal,
> the likelihood that a customer would have two identical machines in his
> shop seems extremely low.  Thus, I could belive that some folks are running
> Sparc 1 and 1+ machines with no problem, while others are having endless
> unexplained failures.

It would explain a lot.

> 	It makes me wonder how long NetBSD should try to continue
> supporting the Sun4c line.  Many of those machines are well over 10 years
> old now, and I wonder, realistically, how many are still in use by
> developers and testers.  I know of some in production use, but they're
> definitely on the retirement track.  How hard is it for developers to
> maintain backward compatibility effectively when there's no testing base to
> prove that it still works?

Sun4c with 'hw cache flush' seems to be a lot more reliable (though I
can't say for sure my SS2 is rock solid -- there are occasional core
dumps which may not be due to application bugs).

However these dinky ones with 'sw cache flush' are definitely a lot less
reliable than they were under 1.3.x or SunOS4.  It sure would be nice to
gain back at least the former reliability without the pain of flushing
the whole cache every time you turn around.  My Xserver used to crash
only about once every other week or so.

FYI, with the pmap patch my machine and the xserver have been running
flawlessly for almost three full days....

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>