port-sparc: Re: ESP SCSI controller errors?

Subject: Re: ESP SCSI controller errors?
To: Andrey Petrov <and@genesyslab.com>
From: john heasley <heas@shrubbery.net>
List: port-sparc
Date: 01/24/2002 16:47:33
Thu, Jan 24, 2002 at 01:21:52PM -0800, Andrey Petrov:
> On Wed, Jan 23, 2002 at 09:36:25PM -0800, john heasley wrote:
> > same here (with either the sbus hme/fas (sparc or ultra) or built-in ultra
> > FAS366 packages).  as far as i can tell, the controller starts off doing
> > tagged-queuing, this error occurs, tagged-queuing is disabled and it never
> > recurs.
> 
> Hey John,
> 
> Why do you think that tagged-queuing didn't recur? It's interesting because

2 or 3 months ago, i had followed the process to the best of my ability
using gdb and printfs.  if my long-term memory serves, after the reset
it just never tried to re-negotiate tagging.

this was while i was trying to figure out why it absolutely refused to
work with my ibm drives.

sd0 at scsibus0 target 0 lun 0: <IBM, DCAS-34330W, S65A> SCSI2 0/direct fixed

we exchanged some mail about that issue before.  again, iirc, it would
just continuously reset when talking to these because the drives was
trying to be ultra-wide (32 bit) but the controller spoke narrow (8-bit)
due to improper order of negotiation and not matching the ultra-wide.
so, setting the width and queuing the command to negotiate, before ahead
of <something, tagging, i think>, did the trick.

hmm, hope thats and acurate description.  i could dig-up more info on that
if you want.

i fixed that (which is why the extra printfs are there) particular issue
as appended.  i probably didnt do it right, but it works.  i noticed
some changes were made to driver not too long ago, but i havent actually
tried removing my changes to see if the width negotiation was fixed.

hope that helps.  i've been meaning to look at this more, but have had
too much work and i'm still very much in learning mode on this level.
if there is anything else i provide, let me know.

> after reset I'd expect it goes full circle again and it should try.
> Well I definitely need a fresh look on driver, it's been a while.
> So it seems that if it doesn't do negotiation after reset it should be
> fixed and then the problem should be seen more reliably.
> 
> I'll check that, thank's for the hint.
> 
> > 
> > i've havent been able to figure out what is happening in the driver, but
> > it seems possible to reproduce it by having two drives on u2 scisbus0,
> > reboot, then cp /netbsd /sd1; as in the two cases below.
> > 
> 
> Regards,
> 	Andrey

Index: ncr53c9x.c
===================================================================
RCS file: /cvsroot/syssrc/sys/dev/ic/ncr53c9x.c,v
retrieving revision 1.89
diff -u -r1.89 ncr53c9x.c
--- ncr53c9x.c	2002/01/12 16:03:12	1.89
+++ ncr53c9x.c	2002/01/25 00:35:55
@@ -227,7 +227,8 @@
 		return;
 	}
 
-	printf(": %s, %dMHz, SCSI ID %d\n",
+	printf("%s: %s, %dMHz, SCSI ID %d\n",
+	    sc->sc_dev.dv_xname,
 	    ncr53c9x_variant_names[sc->sc_rev], sc->sc_freq, sc->sc_id);
 
 	sc->sc_ccf = FREQTOCCF(sc->sc_freq);
@@ -452,6 +453,7 @@
 		}
 		/* Cancel outstanding disconnected commands on each LUN */
 		for (r = 0; r < 8; r++) {
+/* HEAS: 8 s.b. sc->sc_channel->chan_ntargets */
 			LIST_FOREACH(li, &sc->sc_tinfo[r].luns, link) {
 				if ((ecb = li->untagged) != NULL) {
 					li->untagged = NULL;
@@ -466,6 +468,7 @@
 					ncr53c9x_done(sc, ecb);
 				}
 				for (i = 0; i < 256; i++)
+/* HEAS: 256 here s.b. the MAX queued cmds macro def or the value from *sc */
 					if ((ecb = li->queued[i])) {
 						li->queued[i] = NULL;
 						ecb->xs->error = XS_TIMEOUT;
@@ -483,6 +486,8 @@
 
 	sc->sc_phase = sc->sc_prevphase = INVALID_PHASE;
 	for (r = 0; r < 8; r++) {
+/* HEAS: again 8 s.b. the # of targets per *sc */
+
 		struct ncr53c9x_tinfo *ti = &sc->sc_tinfo[r];
 /* XXX - config flags per target: low bits: no reselect; high bits: no synch */
 
@@ -514,7 +519,7 @@
  * NCR_INTR - so make sure it is the last read.
  *
  * I think that (from reading the docs) most bits in these registers
- * only make sense when he DMA CSR has an interrupt showing. Call only
+ * only make sense when the DMA CSR has an interrupt showing. Call only
  * if an interrupt is pending.
  */
 __inline__ void
@@ -1788,13 +1793,20 @@
 				break;
 
 			case MSG_EXT_WDTR:
-				printf("%s: wide mode %d\n",
-				       sc->sc_dev.dv_xname, sc->sc_imess[3]);
-				if (sc->sc_imess[3] == 1) {
-					ti->cfg3 |= NCRFASCFG3_EWIDE;
+				printf("%s: %d bit mode\n",
+					sc->sc_dev.dv_xname,
+					sc->sc_imess[3] == 1 ? 16 : 
+					sc->sc_imess[3] == 2 ? 32 : 8);
+			printf("%s: ti->flags & T_WIDE = %d, ti->width = %d\n",
+					sc->sc_dev.dv_xname,
+					ti->flags & T_WIDE, ti->width);
+				if (ti->flags & T_WIDE) {
+					ti->width = sc->sc_imess[3];
+					if (sc->sc_imess[3] != 0)
+						ti->cfg3 |= NCRFASCFG3_EWIDE;
 					ncr53c9x_setsync(sc, ti);
-				} else
-					ti->width = 0;
+					ncr53c9x_sched_msgout(SEND_WDTR);
+				}
 				ti->flags &= ~T_WIDE;
 				break;
 			default:
Index: ncr53c9xvar.h
===================================================================
RCS file: /cvsroot/syssrc/sys/dev/ic/ncr53c9xvar.h,v
retrieving revision 1.35
diff -u -r1.35 ncr53c9xvar.h
--- ncr53c9xvar.h	2001/12/03 23:27:32	1.35
+++ ncr53c9xvar.h	2002/01/25 00:35:55
@@ -281,10 +281,10 @@
 	/* register defaults */
 	u_char	sc_cfg1;			/* Config 1 */
 	u_char	sc_cfg2;			/* Config 2, not ESP100 */
-	u_char	sc_cfg3;			/* Config 3, only ESP200 */
+	u_char	sc_cfg3;			/* Config 3, ESP200 and FAS */
 	u_char	sc_cfg3_fscsi;			/* Chip-specific FSCSI bit */
-	u_char	sc_cfg4;			/* Config 3, only ESP200 */
-	u_char	sc_cfg5;			/* Config 3, only ESP200 */
+	u_char	sc_cfg4;			/* Config 4, only ESP200 */
+	u_char	sc_cfg5;			/* Config 5, only ESP200 */
 	u_char	sc_ccf;				/* Clock Conversion */
 	u_char	sc_timeout;