Subject: port-sparc64/13654: problems with iommu_dvmamap_load_raw()
To: None <gnats-bugs@gnats.netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: netbsd-bugs
Date: 08/08/2001 18:26:37
>Number: 13654
>Category: port-sparc64
>Synopsis: problems with iommu_dvmamap_load_raw()
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-sparc64-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Aug 08 09:23:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:
>Release: -current as half an our ago (from main CVS)
>Organization:
LIP6, Universite Paris VI.
>Environment:
System: NetBSD java 1.5X NetBSD 1.5X (JAVA) #0: Wed Aug 8 17:22:04 MEST 2001 bouyer@java:/home/cvs.netbsd.org/src/sys/arch/sparc64/compile/JAVA sparc64
Machine: Ultra5 400Mhz
>Description:
I believe there are still problems with iommu_dvmamap_load_raw().
First, the code to compute sgsize (passed to extent_alloc) doesn't
seem to take care of offset withing the pages: if a segment has a
small len, but cross a page boundary (this can happens with mbufs),
we will account one page instead of 2 (or maybe callers of
iommu_dvmamap_load_raw() already split this in 2 segments ? I didn't
check).
Second, I'm almost sure there are problems with out of order segments
(again, sure this can happen with mbuf chains): If we have 3 segments,
seg[0] in page X, seg[1] in page Y != X and seg[2] in page X,
we'll account for 3 pages instead of 2, and we'll have 2 entries in
the IOMMU for page X.
While testing the tl driver on a U5 I get problems under load
(like dd if=/dev/zero of=file bs=64k on a NFS filesystem), the system
panic almost immediatly with either a "psycho0: uncorrectable DMA
error" or in extent_free "region not found".
I added code to check that the size passed to extent_alloc and
extent_free is the same:
Index: include/bus.h
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sparc64/include/bus.h,v
retrieving revision 1.28
diff -u -r1.28 bus.h
--- include/bus.h 2001/07/19 15:32:19 1.28
+++ include/bus.h 2001/08/08 16:06:11
@@ -1514,6 +1514,7 @@
void *_dm_source; /* source mbuf, uio, etc. needed for unload *///////////////////////
void *_dm_cookie; /* cookie for bus-specific functions */
+ bus_size_t _dm_sgsize; /* size of extent */
/*
* PUBLIC MEMBERS: these are used by machine-independent code.
Index: dev/iommu.c
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sparc64/dev/iommu.c,v
retrieving revision 1.37
diff -u -r1.37 iommu.c
--- dev/iommu.c 2001/08/06 22:02:58 1.37
+++ dev/iommu.c 2001/08/08 16:06:11
@@ -501,6 +501,7 @@
err = extent_alloc(is->is_dvmamap, sgsize, align,
boundary, EX_NOWAIT|EX_BOUNDZERO, (u_long *)&dvmaddr);
splx(s);
+ map->_dm_sgsize = sgsize;
#ifdef DEBUG
if (err || (dvmaddr == (bus_addr_t)-1))
@@ -599,6 +600,12 @@
pa = addr + offset + len;
}
+ if (sgsize != map->_dm_sgsize) {
+ printf("iommu_dvmamap_unload: sgsize %ld different from %ld\n",
+ (u_long)sgsize, (u_long)map->_dm_sgsize);
+ /* panic("iommu_dvmamap_unload"); */
+ sgsize = map->_dm_sgsize;
+ }
/* Flush the caches */
bus_dmamap_unload(t->_parent, map);
@@ -656,6 +663,7 @@
pa = segs[i].ds_addr + segs[i].ds_len;
}
sgsize = round_page(sgsize);
+ map->_dm_sgsize = sgsize;
/*
* A boundary presented to bus_dmamem_alloc() takes precedence
With this code, I get:
iommu_dvmamap_unload: sgsize 16384 different from 24576
iommu_dvmamap_unload: sgsize 16384 different from 24576
panic: psycho0: uncorrectable DMA error AFAR 1097e150 AFSR 410000ff40800000
I tried to solve the fist bug (offset not used to compute number of
pages) by using code cut'n'pasted from iommu_dvmamap_unload().
Now the machine didn't panic any more, but I get much more messages
"iommu_dvmamap_unload: sgsize s1 different from s2"
with s1 being one page larger or less than s2; and I get
very weird behavior from the adapter: a tcpdump on the NFS server
shows that I get the last segment *twice*:
18:12:21.054581 java.369053902 > disco-bu.nfs: 1472 write fh 16,20/1931 8192 bytes @ 0 (frag 4368:1480@0+)
18:12:21.054582 java > disco-bu: (frag 4368:920@7400)
18:12:21.054583 java > disco-bu: (frag 4368:1480@1480+)
18:12:21.054585 java > disco-bu: (frag 4368:1480@2960+)
18:12:21.054586 java > disco-bu: (frag 4368:1480@4440+)
18:12:21.054587 java > disco-bu: (frag 4368:1480@5920+)
18:12:21.054588 java > disco-bu: (frag 4368:920@7400)
Yes, the last fragement inserted between first and second, and repeated
at the end. I can't explain this otherwise but the adapter did read
corruped data from DMA (it DMA the transmist list too). I checked
at the driver level, and the list isn't corrupted after transmist.
Now why I believe the problem is in bus_dma and not the tl driver:
I get the exact same behavior off a tlp (21041) adapter, and off a
epic (SMC etherpowerII).
The HME driver doesn't have this problem because it uses statically
allocated buffer to/from which it copies mbufs, and so doesn't
use bus_dmamap_load_mbuf.
>How-To-Repeat:
trie to use a tl, tlp or epic (or probably any driver which uses
bus_dmamap_load_mbuf) in a sparc64 (Ultra5 in my case).
>Fix:
I don't know at this point. getting the algorith to handle
out of order segments in an efficient way isn't that easy, I guess.
>Release-Note:
>Audit-Trail:
>Unformatted: