Subject: port-sparc64/13654: problems with iommu_dvmamap_load_raw()
To: None <gnats-bugs@gnats.netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: netbsd-bugs
Date: 08/08/2001 18:26:37
>Number:         13654
>Category:       port-sparc64
>Synopsis:       problems with iommu_dvmamap_load_raw()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 08 09:23:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     
>Release:        -current as half an our ago (from main CVS)
>Organization:

LIP6, Universite Paris VI.

>Environment:
	
System: NetBSD java 1.5X NetBSD 1.5X (JAVA) #0: Wed Aug 8 17:22:04 MEST 2001 bouyer@java:/home/cvs.netbsd.org/src/sys/arch/sparc64/compile/JAVA sparc64
Machine: Ultra5 400Mhz


>Description:
	I believe there are still problems with iommu_dvmamap_load_raw().
	First, the code to compute sgsize (passed to extent_alloc) doesn't
	seem to take care of offset withing the pages: if a segment has a
	small len, but cross a page boundary (this can happens with mbufs),
	we will account one page instead of 2 (or maybe callers of
	iommu_dvmamap_load_raw() already split this in 2 segments ? I didn't
	check).
	Second, I'm almost sure there are problems with out of order segments
	(again, sure this can happen with mbuf chains): If we have 3 segments,
	seg[0] in page X, seg[1] in page Y != X and seg[2] in page X,
	we'll account for 3 pages instead of 2, and we'll have 2 entries in
	the IOMMU for page X. 

	While testing the tl driver on a U5 I get problems under load
	(like dd if=/dev/zero of=file bs=64k on a NFS filesystem), the system
	panic almost immediatly with either a "psycho0: uncorrectable DMA
	error" or in extent_free "region not found".
	I added code to check that the size passed to extent_alloc and
	extent_free is the same:

Index: include/bus.h
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sparc64/include/bus.h,v
retrieving revision 1.28
diff -u -r1.28 bus.h
--- include/bus.h	2001/07/19 15:32:19	1.28
+++ include/bus.h	2001/08/08 16:06:11
@@ -1514,6 +1514,7 @@
 	void		*_dm_source;	/* source mbuf, uio, etc. needed for unload *///////////////////////
 
 	void		*_dm_cookie;	/* cookie for bus-specific functions */
+	bus_size_t	_dm_sgsize;	/* size of extent */
 
 	/*
 	 * PUBLIC MEMBERS: these are used by machine-independent code.
Index: dev/iommu.c
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sparc64/dev/iommu.c,v
retrieving revision 1.37
diff -u -r1.37 iommu.c
--- dev/iommu.c	2001/08/06 22:02:58	1.37
+++ dev/iommu.c	2001/08/08 16:06:11
@@ -501,6 +501,7 @@
 	err = extent_alloc(is->is_dvmamap, sgsize, align,
 	    boundary, EX_NOWAIT|EX_BOUNDZERO, (u_long *)&dvmaddr);
 	splx(s);
+	map->_dm_sgsize = sgsize;
 
 #ifdef DEBUG
 	if (err || (dvmaddr == (bus_addr_t)-1))	
@@ -599,6 +600,12 @@
 		pa = addr + offset + len;
 
 	}
+	if (sgsize != map->_dm_sgsize) {
+		printf("iommu_dvmamap_unload: sgsize %ld different from %ld\n",
+			(u_long)sgsize, (u_long)map->_dm_sgsize);
+		/* panic("iommu_dvmamap_unload"); */
+		sgsize = map->_dm_sgsize;
+	}
 	/* Flush the caches */
 	bus_dmamap_unload(t->_parent, map);
 
@@ -656,6 +663,7 @@
 		pa = segs[i].ds_addr + segs[i].ds_len;
 	}
 	sgsize = round_page(sgsize);
+	map->_dm_sgsize = sgsize;
 
 	/*
 	 * A boundary presented to bus_dmamem_alloc() takes precedence

	With this code, I get:
iommu_dvmamap_unload: sgsize 16384 different from 24576
iommu_dvmamap_unload: sgsize 16384 different from 24576
panic: psycho0: uncorrectable DMA error AFAR 1097e150 AFSR 410000ff40800000

	I tried to solve the fist bug (offset not used to compute number of
	pages) by using code cut'n'pasted from iommu_dvmamap_unload().
	Now the machine didn't panic any more, but I get much more messages
	"iommu_dvmamap_unload: sgsize s1 different from s2"
	with s1 being one page larger or less than s2; and I get
	very weird behavior from the adapter: a tcpdump on the NFS server
	shows that I get the last segment *twice*: 
18:12:21.054581 java.369053902 > disco-bu.nfs: 1472 write fh 16,20/1931 8192 bytes @ 0 (frag 4368:1480@0+)
18:12:21.054582 java > disco-bu: (frag 4368:920@7400)
18:12:21.054583 java > disco-bu: (frag 4368:1480@1480+)
18:12:21.054585 java > disco-bu: (frag 4368:1480@2960+)
18:12:21.054586 java > disco-bu: (frag 4368:1480@4440+)
18:12:21.054587 java > disco-bu: (frag 4368:1480@5920+)
18:12:21.054588 java > disco-bu: (frag 4368:920@7400)
	Yes, the last fragement inserted between first and second, and repeated
	at the end. I can't explain this otherwise but the adapter did read
	corruped data from DMA (it DMA the transmist list too). I checked
	at the driver level, and the list isn't corrupted after transmist.

	Now why I believe the problem is in bus_dma and not the tl driver:
	I get the exact same behavior off a tlp (21041) adapter, and off a
	epic (SMC etherpowerII).
	The HME driver doesn't have this problem because it uses statically
	allocated buffer to/from which it copies mbufs, and so doesn't
	use bus_dmamap_load_mbuf.

>How-To-Repeat:
	trie to use a tl, tlp or epic (or probably any driver which uses
	bus_dmamap_load_mbuf) in a sparc64 (Ultra5 in my case).
>Fix:
	I don't know at this point. getting the algorith to handle
	out of order segments in an efficient way isn't that easy, I guess.
>Release-Note:
>Audit-Trail:
>Unformatted: