Subject: Re: 3100 woes [was Re: libdl?]
To: Jonathan Stone <jonathan@dsg.stanford.edu>
From: Steven E Lumos <slumos@freddy.CS.UNLV.EDU>
List: port-pmax
Date: 04/24/1997 23:18:59
>Will Ferry says that the 1997-03-16 snapshot kernel works fine on his
>3100 but that newer kernels don't. I'm stuck with that one: the sii
>driver itself has not changed since October 1996.  I'm no longer sure
>what i should be looking at.

The ones we tried called themselves 1.2C and 1.2D, both fail to work.
I also compiled one from the 1.2.1 sources (since we had a local copy
anyway), and it works, but I think this PMAX stuff in 1.2.1 is
actually behind 1.2C, right?

>Since I don't have a 3100,, I cannot reproduce this myself. I really
>don't know what's going on with the sii driver, and hacking at random
>isn't likely to be useful.

No, but we would really like to help diagnose it.

>What  would be really  useful to know is:
>	* what disks people with problems are using

rz0 at sii0 drive 0 slave 0 DEC RZ23   (C) DEC rev 0A18, 204864 512 byte blocks
rz1 at sii0 drive 1 slave 0 CDC 94181-15 rev 2197, 1173930 512 byte blocks

>	* whether they're using SCSI-2 cables and correct termination

The first one is internal, the second one is in a DEC enclosure
connected with a SCSI-2 cable and terminated correctly.

>	* what the last version of the kernel was which worked on
>	  their 3100, if any

If my assumption about the newness of kernels is correct, the one in
1.2.1 is the last that worked.  However, we were running 1.2 before
that, and not -current.  I'm willing to compile the -currents after
each commit if you think it'll help are and willing to extract them
for me.

>	* When the machine first started wedging

Right after we had upgraded to the snapshot, while untarring the
X11R6.3 binaries.  That was exactly the first time.

>	* whether there's any indication that this is _really_
>	  a disk-access problem, or something else.

Always hangs with the disk light on, so presumably during a disk
access.  Benson Chow reported that it only happens during large disk
accesses, but that's not what we are getting.  I originally thought
that might be it, or it might be something intermitent, so I put up
the 1.2D kernel.  By the time I had walked across the building and
pinged it, it was already hung.  As far as I know, it was practically
idle during that time.

>I *could* try adding a hook to get an LK-201 key (DO, maybe?) to panic
>the kernel and/or print a stack backtrace.  Or someone could try
>frobbing the code at sys/arch/pmax/dev/sii.c around line 727, to make
>it do something useful without kadb.

I feel like I'm asking stupid questions here, but what is kadb? Can we
do something useful WITH kadb?

>I guess that means being able to build a kernel, and if your machine
>doesn't stay up long enough to do that, doing *anything* is pretty hard.

Is there any reason that we need a certain kernel version in order to
build some other kernel version? We are running 1.2 very happily now,
except it would be really, really nice to have shared libs, since we
don't really have enough free disk for statically linked X binaries.
We had no trouble going back to 1.2 once we noticed that 1.2C and 1.2D
were both death.  The 1.2C and 1.2D are the pre-compiled ones from the
ftp site BTW.  

We're at a university, so there is a bunch of equipment that we can
probably get on an extremely temporary basis.  NetBSD is doing real
work here on PMAX, SPARC, and i386, so I think there is definitely
interest in getting this fixed.

Steve