Subject: Re: -current on Ultra 5+ - now it's major siop0 lossage
To: None <port-sparc@NetBSD.ORG>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-sparc
Date: 01/26/2001 22:57:36
--VbJkn9YxBvnuCH5J
Content-Type: text/plain; charset=us-ascii

On Thu, Jan 25, 2001 at 06:31:16PM -0800, Greg Earle wrote:
> > Tags were added after 1.5.  The problems you're seeing are
> > probably not directly due to tags but sideffects of code 
> > changes needed to add tags.  For that reason Manuel needs
> > to be aware of this so he can fix the problem for 1.6.
> 
> Well, if it's any help, the problem's gotten *far* worse since the Jan. 14th
> snapshot, from my Ultra 5+'s standpoint.

Than -current before this date, or than 1.5 ?

> 
> > 	NetBSD 1.5 (NETBSD4ME) #0: Thu Jan 25 06:51:41 PST 2001
> > 	    root@netbsd4me:/usr/src/1.5/sys/arch/sparc64/compile/NETBSD4ME
> > 	total memory = 128 MB
> > 	avail memory = 86880 KB
> > 	using 4075 buffers containing 32600 KB of memory
> > 	bootpath: /pci@1f,0/pci@1,0/scsi@1,0/disk@0,0
> > 	mainbus0 (root): SUNW,Ultra-5_10
> > 	cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 333 MHz, version 0 FPU
> > 	cpu0: physical 4K instruction (32 b/l), 4K data (32 b/l), 2048K externa
> >       l (64 b/l)
> > 
> > If this really is an Ultra 5, it has an UltraSPARC-IIi
> > and uses the PCI controller on the CPU.  That controller does
> > not have a streaming cache so disabling it should have no effect.
> 
> It really is an Ultra 5+, honest  :-)
> 
> > 	psycho0 at mainbus0 addr 0xfffc4000
> > 	sabre: bus range 0 to 2; simba b, PCI bus 1; simba a, PCI bus 2
> > 	DVMA map: c0002000 to ffffe000
> > 	pci0 at psycho0
> > 
> > This is strange.  Is it a psycho or a sabre?  psycho has a streaming
> > cache, but sabre is the on-board controller and does not.  You may as
> > well try disabling it to see if that's the problem.  Go into iommu.c and
> > look for `BUS_DMA_COHERENT'.  That is the code to turn off the streaming
> > cache.  Do something like this:
> > 
> > 	tte = MAKEIOTTE(pa, !(flags&BUS_DMA_NOWRITE), !(flags&BUS_DMA_NOCACHE),
> > -			!(flags&BUS_DMA_COHERENT));
> > +			0);
> 
> Made this change; no impact whatsoever.   Same problem persists.
> 
> >> siop0 at pci1 dev 1 function 0: Symbios Logic 53c875 (ultra-wide scsi)
> >> siop0: using on-board RAM
> >> siop0: interrupting at vector 16
> >> scsibus0 at siop0: 16 targets, 8 luns per target
> >> [...]
> >> scsibus0: waiting 2 seconds for devices to settle...
> >> sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct 
> fixed
> >> siop0: target 0 now synchronous at 20.0Mhz, offset 16
> >> sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
> >> root on sd0a dumps on sd0b
> 
> 1.5Q messages:
> 
> scsibus0: waiting 2 seconds for devices to settle...
> siop0: alloc newcdb at PHY addr 0xc006c000
> siop0: target 0 using tagged queuing
> 
> [ARRGH - how do I turn this off???]
> 
> sd0 at scsibus0 target 0 lun 0: <QUANTUM, XP39100S, LYK8> SCSI2 0/direct fixed
> sd0: 8682 MB, 5899 cyl, 20 head, 150 sec, 512 bytes/sect x 17781520 sectors
> DMA IRQ: bus fault dma fifo empty, DSP=0xc0069fec DSA=0xc006df0:
> last msg_in=0x0 status=0xff
> siop0: unhandled scsi interrupt, sist=0x400 sstat1=0xf DSA=0xc006df00 
> DSP=0xc0069fec
> IPsec: Initialized Security Association Processing.
> root device: sd0a
> [...]

I found a table overflow, by 4 bytes.
Don't know why it didn't cause problems on my PCs: maybe different
alignement constraints. Anyway the last byte of the table is written but never
used, so it would erase the next siop_cmd which is free at this time in
most cases. Shound't cause problems at the first I/O.
Or maybe bus_dmamap_sync isn't a nop here.
Ha yes, there's a few missing bus_dmamap_sync() calls too.

Could you try the 2 attached patches please ?

--
Manuel Bouyer <bouyer@antioche.eu.org>
--

--VbJkn9YxBvnuCH5J
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff

Index: siop.c
===================================================================
RCS file: /cvsroot/syssrc/sys/dev/ic/siop.c,v
retrieving revision 1.37
diff -u -r1.37 siop.c
--- siop.c	2000/11/14 18:21:02	1.37
+++ siop.c	2001/01/26 21:54:09
@@ -697,6 +697,7 @@
 			bus_space_write_4(sc->sc_rt, sc->sc_rh, SIOP_DSP,
 			    siop_cmd->dsa + sizeof(struct siop_xfer_common) +
 			    Ent_ldsa_reload_dsa);
+			siop_table_sync(siop_cmd, BUS_DMASYNC_PREWRITE);
 			return 1;
 		case A_int_reseltag:
 			printf("%s: reselect with invalid tag\n",
@@ -1122,6 +1123,7 @@
 		    siop_lun->siop_tag[0].reseloff + 1,
 		    siop_cmd->dsa + sizeof(struct siop_xfer_common) +
 		    Ent_ldsa_reload_dsa);
+		siop_table_sync(siop_cmd, BUS_DMASYNC_PREWRITE);
 	}
 	return 0;
 }
@@ -1490,6 +1492,7 @@
 		siop_cmd->siop_xfer->resel[E_ldsa_abs_slot_Used[0]] = 
 		   htole32(sc->sc_scriptaddr + Ent_script_sched_slot0 +
 		   slot * 8);
+		siop_table_sync(siop_cmd, BUS_DMASYNC_PREWRITE);
 		/* scheduler slot: JUMP ldsa_select */
 		siop_script_write(sc,
 		    (Ent_script_sched_slot0 / 4) + slot * 2 + 1,
Index: siopvar_common.h
===================================================================
RCS file: /cvsroot/syssrc/sys/dev/ic/siopvar_common.h,v
retrieving revision 1.9
diff -u -r1.9 siopvar_common.h
--- siopvar_common.h	2000/11/14 18:21:02	1.9
+++ siopvar_common.h	2001/01/26 21:54:10
@@ -76,7 +76,7 @@
 struct siop_xfer {
 	struct siop_xfer_common tables;
 	/* u_int32_t resel[sizeof(load_dsa) / sizeof(load_dsa[0])]; */
-	u_int32_t resel[24];
+	u_int32_t resel[25];
 } __attribute__((__packed__));
 
 /*

--VbJkn9YxBvnuCH5J--