port-vax: RZ56 - A Resolution!

Subject: RZ56 - A Resolution!
To: None <port-vax@netbsd.org>
From: Jon W Grubbs <jgrubbs@megsinet.net>
List: port-vax
Date: 02/24/2000 21:08:52
Good Day,

The following is a bit long, but I believe the information should be useful 
to a at least one or two other people :-)

Like many others on the list, I have had a couple of RZ56 drive units 
around here that I have wanted to use with my VS3100.  I finally got some 
time to understand the whole story and found a few issues along with how to 
fix them.

1) The MI code for the NCR5380 has a small bug in it  The RZ56 wants to 
negotiate for Synchronous Data Transfers and when it does, the current code 
looks like it will send a REJECT message as the spec calls for when we cant 
handle the request.  However, a couple of lines of code in the 
ncr5380_msg_in routine are missing wrt the PARITY message and anything that 
wants to send a REJECT message.  The ATN setup that is done in 
ncr_sched_msgout is immediately negated by the final byte ACK sequence in 
this routine.  This causes bad things to happen to an RZ56.

The patch looks like this:

--- ncr5380sbc.c.orig	Fri Oct  1 06:18:59 1999
+++ ncr5380sbc.c	Thu Feb 24 20:07:03 2000
@@ -1675,8 +1675,10 @@
  	case MSG_PARITY_ERROR:
  		NCR_TRACE("msg_in: PARITY_ERROR\n", 0);
  		/* Resend the last message. */
  		ncr_sched_msgout(sc, sc->sc_msgout);
+		/* Reset icmd after scheduling the REJECT cmd - jwg */
+		icmd = NCR5380_READ(sc, sci_icmd) & SCI_ICMD_RMASK;
  		break;

  	case MSG_MESSAGE_REJECT:
  		/* The target rejects the last message we sent. */
@@ -1736,8 +1738,10 @@
  		NCR_BREAK();
  		/* fallthrough */
  	reject:
  		ncr_sched_msgout(sc, SEND_REJECT);
+		/* Reset icmd after scheduling the REJECT cmd - jwg */
+		icmd = NCR5380_READ(sc, sci_icmd) & SCI_ICMD_RMASK;
  		break;

  	abort:
  		sc->sc_state |= NCR_ABORTING;

I chose to add the code in two places rather than in the "obvious" place in 
trying to keep the code as streamlined as possible in the most used path.

It turns out that some SCSI drives will gracefully abort a Synchronous Data 
Transfer Request when the code does what it is doing today.  The RZ56 
hopelessly sits with BSY held high while the code is hung waiting for REQ 
from the drive.  Therefore, this is why may not have been an obvious 
problem on a more wide-spread basis.  Add to this, the first error message 
is by design 5 minutes away when it hangs and when you get it, it makes 
little or no sense out of context.  I feel like I know the ncr5390 MI code 
inside and out at this point ;-)

2) A small nit, but the MD code for the NCR5380 assumes that the 
controllers are at unit address 7.  At least on my VS3100/M48, they are 
both at unit 6.  The only affect this apparently has is to disallow the 
probing of devices at unit 7 instead of unit 6.

This patch looks like this:

--- ncr.c.orig	Mon Jan 24 06:33:29 2000
+++ ncr.c	Tue Feb 22 15:32:18 2000
@@ -205,9 +205,9 @@
  	 * Fill in the prototype scsi_link.
  	 */
  	ncr_sc->sc_link.scsipi_scsi.channel = SCSI_CHANNEL_ONLY_ONE;
  	ncr_sc->sc_link.adapter_softc = sc;
-	ncr_sc->sc_link.scsipi_scsi.adapter_target = 7;
+	ncr_sc->sc_link.scsipi_scsi.adapter_target = 6;
  	ncr_sc->sc_link.adapter = &ncr_sc->sc_adapter;
  	ncr_sc->sc_link.device = &si_dev;
  	ncr_sc->sc_link.type = BUS_SCSI;

If this is specific to the VS3100 or even the M48 then an appropriate 
DEFINE would have to be added here.

3) Lastly, there has been alot of talk about how to get the RZ56 to spin up 
on power-on.  This is not required for NetBSD/VAX but may be required for 
other purposes.  The TEST_UNIT_READY command used by the existing SCSI code 
is enough to get its attention.  I did find an ancient DECUServe thread on 
the matter however.  Basically it involves the following steps.  My thanks 
to William Jackson for the original post to the DECUServe list in 1994 and 
thanks to DECUS for keeping the archives around so long!  I have 
consolidated the steps below.

WARNING:  I take no responsibility for what happens if you try this!!!!  It 
worked for me on my VS3100/M48 and I have done it both ways several times 
without incident.  However, a mistyped character could leave the unit in an 
unknown and possibly unrecoverable state.

First you need to get the VS3100 in diagnostic mode.  This is accomplished 
by inserting an MMJ into the SERIAL port (not the printer port) with all 
pairs looped (1 to 6, 2 to 5, 3 to 4).  Power on the unit and enter

 >>> test 73

You will be greeted with "KA42 TPmker" and asked which bus the "Tape Unit" 
is on.

VStmk_QUE_port (A,B) ? B

Answer appropriate for your RZ56.  Mine was on Bus B.

The next prompt is asking for the target unit:

VStmk_QUE_id (0,1,2,3,4,5,6,7) ? 2

Answer appropriate for your RZ56. Again, mine was unit 2.

Lastly, you will be prompted:

vstmk_que_rusure (1/0) ?

Contrary to the prompt, enter "42000001" if you want the drive to spin-up 
on power-on or "41000001" to get it to not spin on power-on.

Next you will see the actual SCSI commands and responses issued to the unit.

I have captured the data that changes the mode page and will, when I have 
time, write a small program to do this from the OS.  For now, this other 
hack might be useful to others so I thought I'd share.

The other interesting thing about the PROMS in the VS3100 is that they are 
using a bare minimum SCSI set to talk to the drives, not even using the now 
SCSI-2 mandatory IDENTIFY message.  They are truly using the bare minimum 
of SCSI-1 to make things work.  Hence you never see the SDTR message as you 
do when using NetBSD.

I hope this information to some of you with RZ56 drives.  I know in my case 
I have 1.2 GB worth of paperweights that are now back in service.

Happy VAXing,

--
Jon Grubbs
jgrubbs@megsinet.net