Port-sgimips archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

mec(4) TX improvement and mbuf statistics



I've been playing on mec(4) driver to improve its TX function
by reducing possible unnecessary mbuf copies before TX DMA.
This my new TX routine seems working fine, but I'd like to see
how it goes on different environment before committing it.


The mec(4) TX hardware can handle the following DMA buffers
in each DMA descriptor:

(1) ~120bytes static buffer
(2) three "concatination pointer" which can xfer 8 byte aligned
    and contiguous buffer for direct DMA

and -current TX strategy of mec(4) driver is:

(1) If TX packet is smaller than 60bytes (i.e. we have to pad it),
    copy whole packet into the static buffer and pad it.
(2) For packets larger than 60bytes, prepare nsegs=1 dmamap for each
    descriptos and try bus_dmamap_load_mbuf(9) first.
(2a) If the TX packet fits the dmamap (i.e. it's contiguous and has
     no fragment), use only the first concatination pointer to DMA it
     and unaligned part is copied into the static buffer.
(2b) If the TX packet doesn't fit the dmamap (i.e. it has more than one
     fragments), allocate a new (and contiguous) mbuf and copy whole data
     into it
 (see if_mec.c for more details)

I guessed many TX packets had certain fragmentation (especially in
their headers) so most of them were copied into newly allocated mbufs.

To improve this situation, I've rewritten mec_start() function
with the following strategy:

(1) If TX packet is smaller than or equal to the TX static buffer,
    copy the whole packet into the buffer.
(2) If TX packet is larger than the static buffer, try to copy
    fragments in first ~88bytes of the packet.
(3) If the rest part is not 8 byte aligned, also copy unaligned part
    into the static buffer.
(4) If a number of the rest fragment is three or less,
    and also all the fragments are 8byte alinged,
    use the concatination pointers to handle them.
(5) If they doesn't match, allocate a new mbuf and copy the packet to it.

I also put a bunch of evcnt(9) to get statistics of TX packet mbufs
and got an interesting result:

---
mec0 TX pkts queued total                  20000009      132 misc
mec0 TX pkts padded in txdesc buf               986        0 misc
mec0 TX pkts copied to txdesc buf           1233978        8 misc
mec0 TX pkts using concat ptr1             16417689      109 misc
mec0 TX pkts  w/ptr1  ~160bytes              536752        3 misc
mec0 TX pkts  w/ptr1  ~256bytes            12570200       83 misc
mec0 TX pkts  w/ptr1  ~512bytes              942166        6 misc
mec0 TX pkts  w/ptr1 ~1024bytes              152728        1 misc
mec0 TX pkts  w/ptr1 >1024bytes             2215843       14 misc
mec0 TX pkts using concat ptr1,2            2247801       14 misc
mec0 TX pkts  w/ptr2  ~160bytes                 478        0 misc
mec0 TX pkts  w/ptr2  ~256bytes                5027        0 misc
mec0 TX pkts  w/ptr2  ~512bytes               65308        0 misc
mec0 TX pkts  w/ptr2 ~1024bytes               94842        0 misc
mec0 TX pkts  w/ptr2 >1024bytes             2082146       13 misc
mec0 TX pkts using concat ptr1,2,3            54427        0 misc
mec0 TX pkts  w/ptr3  ~160bytes                   0        0 misc
mec0 TX pkts  w/ptr3  ~256bytes                5495        0 misc
mec0 TX pkts  w/ptr3  ~512bytes               25107        0 misc
mec0 TX pkts  w/ptr3 ~1024bytes                3518        0 misc
mec0 TX pkts  w/ptr3 >1024bytes               20307        0 misc
mec0 TX pkts copied to new mbufs              45128        0 misc
mec0 TX pkts  w/mbuf  ~160bytes                   0        0 misc
mec0 TX pkts  w/mbuf  ~256bytes                   0        0 misc
mec0 TX pkts  w/mbuf  ~512bytes                  27        0 misc
mec0 TX pkts  w/mbuf ~1024bytes                 329        0 misc
mec0 TX pkts  w/mbuf >1024bytes               44772        0 misc
mec0 TX pkts using ptrs total              18719917      124 misc
mec0 TX pkts  w/ptrs no hdr chain          10130135       67 misc
mec0 TX pkts  w/ptrs  1 hdr chain           8586208       57 misc
mec0 TX pkts  w/ptrs  2 hdr chains             3574        0 misc
mec0 TX pkts  w/ptrs  3 hdr chains                0        0 misc
mec0 TX pkts  w/ptrs  4 hdr chains                0        0 misc
mec0 TX pkts  w/ptrs  5 hdr chains                0        0 misc
mec0 TX pkts  w/ptrs >5 hdr chains                0        0 misc
mec0 TX pkts  w/ptrs  ~8bytes hdr          10130135       67 misc
mec0 TX pkts  w/ptrs ~16bytes hdr             26186        0 misc
mec0 TX pkts  w/ptrs ~32bytes hdr                 0        0 misc
mec0 TX pkts  w/ptrs ~64bytes hdr           2236999       14 misc
mec0 TX pkts  w/ptrs ~80bytes hdr           6324800       42 misc
mec0 TX pkts  w/ptrs ~96bytes hdr              1797        0 misc

note:
- "mec0 TX pkts padded in txdesc buf" are packets less than 60bytes
- "mec0 TX pkts copied to txdesc buf" are packets 61~120bytes
- "mec0 TX pkts using concat ptr1" are packets sent by using
  one concatination pointer
- "mec0 TX pkts using concat ptr1,2(,3)" are packets sent by using
   two (or three) concatination pointers
- "mec0 TX pkts copied to new mbufs" are packets which can't be handled
  by the concatination pointers (due to fragmentation or unalignment)
- "mec0 TX pkts  w/ptrs N hdr chain" are packets which have N fragments
   in the first ~88bytes.
- "mec0 TX pkts  w/ptrs ~NNbytes hdr" are packets which have NN byte
  fragments in the first 96 (88+8) bytes.
- No IPv6 packets in this test

---

This means:
- more than 90% packets are transfered by the concatination pointers
- ~6% packets are smaller than 120bytes (and copied to the static buffer)
- only ~0.2% packets require new mbufs due to hardware DMA restriction.
- 54% packets has no fragment, at least in the first 88bytes.
  (and very few bytes are copied into the static buffers)
- ~46% packets has only one fragment in the first 88bytes,
  and very few packets have more than two fragments.


Quick ttcp(1) results (both TCP and UDP) are here:

with -current driver:
---
# ttcp -ts -n 10000 192.168.20.1
ttcp-t: buflen=8192, nbuf=10000, align=16384/0, port=5001  tcp  -> 192.168.20.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 81920000 bytes in 9.01 real seconds = 8876.46 KB/sec +++
ttcp-t: 10000 I/O calls, msec/call = 0.92, calls/sec = 1109.56
ttcp-t: 0.0user 7.2sys 0:09real 80% 0i+0d 0maxrss 0+2pf -5+54csw
# ttcp -tsu -n 10000 192.168.20.1
ttcp-t: buflen=8192, nbuf=10000, align=16384/0, port=5001  udp  -> 192.168.20.1
ttcp-t: socket
ttcp-t: 81920000 bytes in 6.80 real seconds = 11766.68 KB/sec +++
ttcp-t: 10056 I/O calls, msec/call = 0.69, calls/sec = 1479.07
ttcp-t: 0.0user 5.4sys 0:06real 81% 0i+0d 0maxrss 0+2pf 46+14csw
#
---

with revised one:
---
# ttcp -ts -n 10000 192.168.20.1
ttcp-t: buflen=8192, nbuf=10000, align=16384/0, port=5001  tcp  -> 192.168.20.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 81920000 bytes in 8.03 real seconds = 9957.48 KB/sec +++
ttcp-t: 10000 I/O calls, msec/call = 0.82, calls/sec = 1244.69
ttcp-t: 0.0user 6.7sys 0:08real 84% 0i+0d 0maxrss 0+2pf -1+49csw
# ttcp -tsu -n 10000 192.168.20.1
ttcp-t: buflen=8192, nbuf=10000, align=16384/0, port=5001  udp  -> 192.168.20.1
ttcp-t: socket
ttcp-t: 81920000 bytes in 6.82 real seconds = 11736.20 KB/sec +++
ttcp-t: 10115 I/O calls, msec/call = 0.69, calls/sec = 1483.90
ttcp-t: 0.0user 4.0sys 0:06real 59% 0i+0d 0maxrss 0+2pf 109+10csw
# 
---
i.e. TCP is ~12% faster, and UDP is less CPU load (with saturated speed?).

Patch of the driver is attaced.
Note please comment out #define MEC_EVENT_COUNTERS line
if you don't want to enable evcnt(9) statistics.

Comments?

---
Index: if_mec.c
===================================================================
RCS file: /cvsroot/src/sys/arch/sgimips/mace/if_mec.c,v
retrieving revision 1.30
diff -u -r1.30 if_mec.c
--- if_mec.c    14 Aug 2008 03:48:43 -0000      1.30
+++ if_mec.c    15 Aug 2008 12:44:40 -0000
@@ -1,7 +1,7 @@
 /* $NetBSD: if_mec.c,v 1.30 2008/08/14 03:48:43 tsutsui Exp $ */
 
 /*-
- * Copyright (c) 2004 Izumi Tsutsui.  All rights reserved.
+ * Copyright (c) 2004, 2008 Izumi Tsutsui.  All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -113,12 +113,21 @@
 #define MEC_DEBUG_INTR         0x08
 #define MEC_DEBUG_RXINTR       0x10
 #define MEC_DEBUG_TXINTR       0x20
+#define MEC_DEBUG_TXSEGS       0x40
 uint32_t mec_debug = 0;
 #define DPRINTF(x, y)  if (mec_debug & (x)) printf y
 #else
 #define DPRINTF(x, y)  /* nothing */
 #endif
 
+#define MEC_EVENT_COUNTERS
+
+#ifdef MEC_EVENT_COUNTERS
+#define MEC_EVCNT_INCR(ev)     (ev)->ev_count++
+#else
+#define MEC_EVCNT_INCR(ev)     do {} while (/* CONSTCOND */ 0)
+#endif
+
 /*
  * Transmit descriptor list size
  */
@@ -136,8 +145,7 @@
        bus_dmamap_t txs_dmamap;        /* our DMA map */
        uint32_t txs_flags;
 #define MEC_TXS_BUFLEN_MASK    0x0000007f      /* data len in txd_buf */
-#define MEC_TXS_TXDBUF         0x00000080      /* txd_buf is used */
-#define MEC_TXS_TXDPTR1                0x00000100      /* txd_ptr[0] is used */
+#define MEC_TXS_TXDPTR         0x00000080      /* concat txd_ptr is used */
 };
 
 /*
@@ -145,13 +153,17 @@
  */
 #define MEC_TXDESCSIZE         128
 #define MEC_NTXPTR             3
-#define MEC_TXD_BUFOFFSET      \
-       (sizeof(uint64_t) + MEC_NTXPTR * sizeof(uint64_t))
+#define MEC_TXD_BUFOFFSET      sizeof(uint64_t)
+#define MEC_TXD_BUFOFFSET1     \
+       (sizeof(uint64_t) + sizeof(uint64_t) * MEC_NTXPTR)
 #define MEC_TXD_BUFSIZE                (MEC_TXDESCSIZE - MEC_TXD_BUFOFFSET)
+#define MEC_TXD_BUFSIZE1       (MEC_TXDESCSIZE - MEC_TXD_BUFOFFSET1)
 #define MEC_TXD_BUFSTART(len)  (MEC_TXD_BUFSIZE - (len))
 #define MEC_TXD_ALIGN          8
+#define MEC_TXD_ALIGNMASK      (MEC_TXD_ALIGN - 1)
 #define MEC_TXD_ROUNDUP(addr)  \
-       (((addr) + (MEC_TXD_ALIGN - 1)) & ~((uint64_t)MEC_TXD_ALIGN - 1))
+       (((addr) + MEC_TXD_ALIGNMASK) & ~(uint64_t)MEC_TXD_ALIGNMASK)
+#define MEC_NTXSEG             16
 
 struct mec_txdesc {
        volatile uint64_t txd_cmd;
@@ -181,14 +193,18 @@
 #define MEC_TXSTAT_UNUSED      0x7fffffffe0000000ULL   /* should be zero */
 #define MEC_TXSTAT_SENT                0x8000000000000000ULL   /* packet sent 
*/
 
-       uint64_t txd_ptr[MEC_NTXPTR];
+       union {
+               uint64_t txptr[MEC_NTXPTR];
 #define MEC_TXPTR_UNUSED2      0x0000000000000007      /* should be zero */
 #define MEC_TXPTR_DMAADDR      0x00000000fffffff8      /* TX DMA address */
 #define MEC_TXPTR_LEN          0x0000ffff00000000ULL   /* buffer length */
 #define  TXPTR_LEN(x)          ((uint64_t)(x) << 32)
 #define MEC_TXPTR_UNUSED1      0xffff000000000000ULL   /* should be zero */
 
-       uint8_t txd_buf[MEC_TXD_BUFSIZE];
+               uint8_t txbuf[MEC_TXD_BUFSIZE];
+       } txd_data;
+#define txd_ptr                txd_data.txptr
+#define txd_buf                txd_data.txbuf
 };
 
 /*
@@ -300,6 +316,52 @@
 #if NRND > 0
        rndsource_element_t sc_rnd_source; /* random source */
 #endif
+#ifdef MEC_EVENT_COUNTERS
+       struct evcnt sc_ev_txpkts;      /* TX packets queued total */
+       struct evcnt sc_ev_txdpad;      /* TX packets padded in txdesc buf */
+       struct evcnt sc_ev_txdbuf;      /* TX packets copied to txdesc buf */
+       struct evcnt sc_ev_txptr1;      /* TX packets using concat ptr1 */
+       struct evcnt sc_ev_txptr1a;     /* TX packets  w/ptr1  ~160bytes */
+       struct evcnt sc_ev_txptr1b;     /* TX packets  w/ptr1  ~256bytes */
+       struct evcnt sc_ev_txptr1c;     /* TX packets  w/ptr1  ~512bytes */
+       struct evcnt sc_ev_txptr1d;     /* TX packets  w/ptr1 ~1024bytes */
+       struct evcnt sc_ev_txptr1e;     /* TX packets  w/ptr1 >1024bytes */
+       struct evcnt sc_ev_txptr2;      /* TX packets using concat ptr1,2 */
+       struct evcnt sc_ev_txptr2a;     /* TX packets  w/ptr2  ~160bytes */
+       struct evcnt sc_ev_txptr2b;     /* TX packets  w/ptr2  ~256bytes */
+       struct evcnt sc_ev_txptr2c;     /* TX packets  w/ptr2  ~512bytes */
+       struct evcnt sc_ev_txptr2d;     /* TX packets  w/ptr2 ~1024bytes */
+       struct evcnt sc_ev_txptr2e;     /* TX packets  w/ptr2 >1024bytes */
+       struct evcnt sc_ev_txptr3;      /* TX packets using concat ptr1,2,3 */
+       struct evcnt sc_ev_txptr3a;     /* TX packets  w/ptr3  ~160bytes */
+       struct evcnt sc_ev_txptr3b;     /* TX packets  w/ptr3  ~256bytes */
+       struct evcnt sc_ev_txptr3c;     /* TX packets  w/ptr3  ~512bytes */
+       struct evcnt sc_ev_txptr3d;     /* TX packets  w/ptr3 ~1024bytes */
+       struct evcnt sc_ev_txptr3e;     /* TX packets  w/ptr3 >1024bytes */
+       struct evcnt sc_ev_txmbuf;      /* TX packets copied to new mbufs */
+       struct evcnt sc_ev_txmbufa;     /* TX packets  w/mbuf  ~160bytes */
+       struct evcnt sc_ev_txmbufb;     /* TX packets  w/mbuf  ~256bytes */
+       struct evcnt sc_ev_txmbufc;     /* TX packets  w/mbuf  ~512bytes */
+       struct evcnt sc_ev_txmbufd;     /* TX packets  w/mbuf ~1024bytes */
+       struct evcnt sc_ev_txmbufe;     /* TX packets  w/mbuf >1024bytes */
+       struct evcnt sc_ev_txptrs;      /* TX packets using ptrs total */
+       struct evcnt sc_ev_txptrc0;     /* TX packets  w/ptrs no hdr chain */
+       struct evcnt sc_ev_txptrc1;     /* TX packets  w/ptrs  1 hdr chain */
+       struct evcnt sc_ev_txptrc2;     /* TX packets  w/ptrs  2 hdr chains */
+       struct evcnt sc_ev_txptrc3;     /* TX packets  w/ptrs  3 hdr chains */
+       struct evcnt sc_ev_txptrc4;     /* TX packets  w/ptrs  4 hdr chains */
+       struct evcnt sc_ev_txptrc5;     /* TX packets  w/ptrs  5 hdr chains */
+       struct evcnt sc_ev_txptrc6;     /* TX packets  w/ptrs >5 hdr chains */
+       struct evcnt sc_ev_txptrh0;     /* TX packets  w/ptrs  ~8bytes hdr */
+       struct evcnt sc_ev_txptrh1;     /* TX packets  w/ptrs ~16bytes hdr */
+       struct evcnt sc_ev_txptrh2;     /* TX packets  w/ptrs ~32bytes hdr */
+       struct evcnt sc_ev_txptrh3;     /* TX packets  w/ptrs ~64bytes hdr */
+       struct evcnt sc_ev_txptrh4;     /* TX packets  w/ptrs ~80bytes hdr */
+       struct evcnt sc_ev_txptrh5;     /* TX packets  w/ptrs ~96bytes hdr */
+       struct evcnt sc_ev_txdstall;    /* TX stalled due to no txdesc */
+       struct evcnt sc_ev_txempty;     /* TX empty interrupts */
+       struct evcnt sc_ev_txsent;      /* TX sent interrupts */
+#endif
 };
 
 #define MEC_CDTXADDR(sc, x)    ((sc)->sc_cddma + MEC_CDTXOFF(x))
@@ -433,7 +495,7 @@
        /* create TX buffer DMA maps */
        for (i = 0; i < MEC_NTXDESC; i++) {
                if ((err = bus_dmamap_create(sc->sc_dmat,
-                   MCLBYTES, 1, MCLBYTES, PAGE_SIZE, 0,
+                   MCLBYTES, MEC_NTXSEG, MCLBYTES, PAGE_SIZE, 0,
                    &sc->sc_txsoft[i].txs_dmamap)) != 0) {
                        aprint_error(": unable to create tx DMA map %d,"
                            " error = %d\n", i, err);
@@ -557,6 +619,97 @@
            RND_TYPE_NET, 0);
 #endif
 
+#ifdef MEC_EVENT_COUNTERS
+       evcnt_attach_dynamic(&sc->sc_ev_txpkts , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts queued total");
+       evcnt_attach_dynamic(&sc->sc_ev_txdpad , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts padded in txdesc buf");
+       evcnt_attach_dynamic(&sc->sc_ev_txdbuf , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts copied to txdesc buf");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr1 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts using concat ptr1");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr1a , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr1  ~160bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr1b , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr1  ~256bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr1c , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr1  ~512bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr1d , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr1 ~1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr1e , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr1 >1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr2 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts using concat ptr1,2");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr2a , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr2  ~160bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr2b , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr2  ~256bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr2c , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr2  ~512bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr2d , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr2 ~1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr2e , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr2 >1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr3 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts using concat ptr1,2,3");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr3a , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr3  ~160bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr3b , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr3  ~256bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr3c , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr3  ~512bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr3d , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr3 ~1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptr3e , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptr3 >1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txmbuf , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts copied to new mbufs");
+       evcnt_attach_dynamic(&sc->sc_ev_txmbufa , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/mbuf  ~160bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txmbufb , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/mbuf  ~256bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txmbufc , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/mbuf  ~512bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txmbufd , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/mbuf ~1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txmbufe , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/mbuf >1024bytes");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrs , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts using ptrs total");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc0 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs no hdr chain");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc1 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs  1 hdr chain");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc2 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs  2 hdr chains");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc3 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs  3 hdr chains");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc4 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs  4 hdr chains");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc5 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs  5 hdr chains");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrc6 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs >5 hdr chains");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrh0 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs  ~8bytes hdr");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrh1 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs ~16bytes hdr");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrh2 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs ~32bytes hdr");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrh3 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs ~64bytes hdr");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrh4 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs ~80bytes hdr");
+       evcnt_attach_dynamic(&sc->sc_ev_txptrh5 , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX pkts  w/ptrs ~96bytes hdr");
+       evcnt_attach_dynamic(&sc->sc_ev_txdstall , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX stalled due to no txdesc");
+       evcnt_attach_dynamic(&sc->sc_ev_txempty , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX empty interrupts");
+       evcnt_attach_dynamic(&sc->sc_ev_txsent , EVCNT_TYPE_MISC,
+           NULL, device_xname(self), "TX sent interrupts");
+#endif
+
        /* set shutdown hook to reset interface on powerdown */
        sc->sc_sdhook = shutdownhook_establish(mec_shutdown, sc);
 
@@ -815,9 +968,9 @@
        bus_dmamap_t dmamap;
        bus_space_tag_t st = sc->sc_st;
        bus_space_handle_t sh = sc->sc_sh;
-       uint64_t txdaddr;
        int error, firsttx, nexttx, opending;
-       int len, bufoff, buflen, unaligned, txdlen;
+       int len, bufoff, buflen, nsegs, align, resid, pseg, nptr, slen, i;
+       uint32_t txdcmd;
 
        if ((ifp->if_flags & (IFF_RUNNING|IFF_OACTIVE)) != IFF_RUNNING)
                return;
@@ -844,58 +997,153 @@
                nexttx = MEC_NEXTTX(sc->sc_txlast);
                txd = &sc->sc_txdesc[nexttx];
                txs = &sc->sc_txsoft[nexttx];
+               dmamap = txs->txs_dmamap;
+               txs->txs_flags = 0;
 
                buflen = 0;
                bufoff = 0;
-               txdaddr = 0; /* XXX gcc */
-               txdlen = 0; /* XXX gcc */
+               resid = 0;
+               nptr = 0;       /* XXX gcc */
+               pseg = 0;       /* XXX gcc */
 
                len = m0->m_pkthdr.len;
 
                DPRINTF(MEC_DEBUG_START,
-                   ("mec_start: len = %d, nexttx = %d\n", len, nexttx));
+                   ("mec_start: len = %d, nexttx = %d, txpending = %d\n",
+                   len, nexttx, sc->sc_txpending));
 
-               if (len < ETHER_PAD_LEN) {
+               if (len <= MEC_TXD_BUFSIZE) {
                        /*
-                        * I don't know if MEC chip does auto padding,
-                        * so if the packet is small enough,
-                        * just copy it to the buffer in txdesc.
-                        * Maybe this is the simple way.
+                        * If a TX packet will fit into small txdesc buffer,
+                        * just copy it into there. Maybe it's faster than
+                        * checking alignment and calling bus_dma(9) etc.
                         */
                        DPRINTF(MEC_DEBUG_START, ("mec_start: short packet\n"));
-
                        IFQ_DEQUEUE(&ifp->if_snd, m0);
-                       bufoff = MEC_TXD_BUFSTART(ETHER_PAD_LEN);
-                       m_copydata(m0, 0, m0->m_pkthdr.len,
-                           txd->txd_buf + bufoff);
-                       memset(txd->txd_buf + bufoff + len, 0,
-                           ETHER_PAD_LEN - len);
-                       len = buflen = ETHER_PAD_LEN;
 
-                       txs->txs_flags = MEC_TXS_TXDBUF | buflen;
+                       /*
+                        * I don't know if MEC chip does auto padding,
+                        * but do it manually for safety.
+                        */
+                       if (len < ETHER_PAD_LEN) {
+                               MEC_EVCNT_INCR(&sc->sc_ev_txdpad);
+                               bufoff = MEC_TXD_BUFSTART(ETHER_PAD_LEN);
+                               m_copydata(m0, 0, len, txd->txd_buf + bufoff);
+                               memset(txd->txd_buf + bufoff + len, 0,
+                                   ETHER_PAD_LEN - len);
+                               len = buflen = ETHER_PAD_LEN;
+                       } else {
+                               MEC_EVCNT_INCR(&sc->sc_ev_txdbuf);
+                               bufoff = MEC_TXD_BUFSTART(len);
+                               m_copydata(m0, 0, len, txd->txd_buf + bufoff);
+                               buflen = len;
+                       }
                } else {
                        /*
-                        * If the packet won't fit the buffer in txdesc,
-                        * we have to use concatenate pointer to handle it.
-                        * While MEC can handle up to three segments to
-                        * concatenate, MEC requires that both the second and
-                        * third segments have to be 8 byte aligned.
-                        * Since it's unlikely for mbuf clusters, we use
-                        * only the first concatenate pointer. If the packet
-                        * doesn't fit in one DMA segment, allocate new mbuf
-                        * and copy the packet to it.
-                        *
-                        * Besides, if the start address of the first segments
-                        * is not 8 byte aligned, such part have to be copied
-                        * to the txdesc buffer. (XXX see below comments)
-                        */
+                        * If the packet won't fit the static buffer in txdesc,
+                        * we have to use the concatenate pointers to handle it.
+                        */
                        DPRINTF(MEC_DEBUG_START, ("mec_start: long packet\n"));
+                       txs->txs_flags = MEC_TXS_TXDPTR;
 
-                       dmamap = txs->txs_dmamap;
-                       if (bus_dmamap_load_mbuf(sc->sc_dmat, dmamap, m0,
-                           BUS_DMA_WRITE | BUS_DMA_NOWAIT) != 0) {
-                               DPRINTF(MEC_DEBUG_START,
+                       /*
+                        * Call bus_dmamap_load_mbuf(9) first to see
+                        * how many chains the TX mbuf has.
+                        */
+                       error = bus_dmamap_load_mbuf(sc->sc_dmat, dmamap, m0,
+                           BUS_DMA_WRITE | BUS_DMA_NOWAIT);
+                       if (error == 0) {
+                               /*
+                                * Check chains which might contain headers.
+                                * They might be so much fragmented and
+                                * it's better to copy them into txdesc buffer
+                                * since they would be small enough.
+                                */
+                               nsegs = dmamap->dm_nsegs;
+                               for (pseg = 0; pseg < nsegs; pseg++) {
+                                       slen = dmamap->dm_segs[pseg].ds_len;
+                                       if (buflen + slen >
+                                           MEC_TXD_BUFSIZE1 - MEC_TXD_ALIGN)
+                                               break;
+                                       buflen += slen;
+                               }
+                               /*
+                                * Check if the rest chains can be fit into
+                                * the concatinate pointers.
+                                */
+                               align = dmamap->dm_segs[pseg].ds_addr &
+                                   MEC_TXD_ALIGNMASK;
+                               if (align > 0) {
+                                       /*
+                                        * If the first chain isn't uint64_t
+                                        * aligned, append the unaligned part
+                                        * into txdesc buffer too.
+                                        */
+                                       resid = MEC_TXD_ALIGN - align;
+                                       buflen += resid;
+                                       for (; pseg < nsegs; pseg++) {
+                                               slen =
+                                                 dmamap->dm_segs[pseg].ds_len;
+                                               if (slen > resid)
+                                                       break;
+                                               resid -= slen;
+                                       }
+                               } else if (pseg == 0) {
+                                       /*
+                                        * In this case, the first chain is
+                                        * uint64_t aligned but it's too long
+                                        * to put into txdesc buf.
+                                        * We have to put some data into
+                                        * txdesc buf even in this case,
+                                        * so put MEC_TXD_ALIGN bytes there.
+                                        */
+                                       buflen = resid = MEC_TXD_ALIGN;
+                               }
+                               nptr = nsegs - pseg;
+                               if (nptr <= MEC_NTXPTR) {
+                                       bufoff = MEC_TXD_BUFSTART(buflen);
+
+                                       /*
+                                        * Check if all the rest chains are
+                                        * uint64_t aligned.
+                                        */
+                                       align = 0;
+                                       for (i = pseg + 1; i < nsegs; i++)
+                                               align |=
+                                                   dmamap->dm_segs[i].ds_addr
+                                                   & MEC_TXD_ALIGNMASK;
+                                       if (align != 0) {
+                                               /* chains are not aligned */
+                                               error = -1;
+                                       }
+                               } else {
+                                       /* The TX mbuf chains doesn't fit. */
+                                       error = -1;
+                               }
+                               if (error == -1)
+                                       bus_dmamap_unload(sc->sc_dmat, dmamap);
+                       }
+                       if (error != 0) {
+                               /*
+                                * The TX mbuf chains can't be put into
+                                * the concatinate buffers. In this case,
+                                * we have to allocate a new contiguous mbuf
+                                * and copy data into it.
+                                *
+                                * Even in this case, the Ethernet header in
+                                * the TX mbuf might be unaligned and trailing
+                                * data might be word aligned, so put 2 byte
+                                * (MEC_ETHER_ALIGN) padding at the top of the
+                                * allocated mbuf and copy TX packets.
+                                * 6 bytes (MEC_ALIGN_BYTES - MEC_ETHER_ALIGN)
+                                * at the top of the new mbuf won't be uint64_t
+                                * alignd, but we have to put some data into
+                                * txdesc buffer anyway even if the buffer
+                                * is uint64_t aligned.
+                                */ 
+                               DPRINTF(MEC_DEBUG_START|MEC_DEBUG_TXSEGS,
                                    ("mec_start: re-allocating mbuf\n"));
+
                                MGETHDR(m, M_DONTWAIT, MT_DATA);
                                if (m == NULL) {
                                        printf("%s: unable to allocate "
@@ -913,91 +1161,152 @@
                                                break;
                                        }
                                }
+                               m->m_data += MEC_ETHER_ALIGN;
+
                                /*
-                                * Each packet has the Ethernet header, so
-                                * in many case the header isn't 4-byte aligned
-                                * and data after the header is 4-byte aligned.
-                                * Thus adding 2-byte offset before copying to
-                                * new mbuf avoids unaligned copy and this may
-                                * improve some performance.
-                                * As noted above, unaligned part has to be
-                                * copied to txdesc buffer so this may cause
-                                * extra copy ops, but for now MEC always
-                                * requires some data in txdesc buffer,
-                                * so we always have to copy some data anyway.
+                                * Copy whole data (including unaligned part)
+                                * for following bpf_mtap().
                                 */
-                               m->m_data += MEC_ETHER_ALIGN;
                                m_copydata(m0, 0, len, mtod(m, void *));
                                m->m_pkthdr.len = m->m_len = len;
                                error = bus_dmamap_load_mbuf(sc->sc_dmat,
                                    dmamap, m, BUS_DMA_WRITE | BUS_DMA_NOWAIT);
-                               if (error) {
+                               if (dmamap->dm_nsegs > 1) {
+                                       /* should not happen, but for sanity */
+                                       bus_dmamap_unload(sc->sc_dmat, dmamap);
+                                       error = -1;
+                               }
+                               if (error != 0) {
                                        printf("%s: unable to load TX buffer, "
                                            "error = %d\n",
                                            device_xname(sc->sc_dev), error);
                                        m_freem(m);
                                        break;
                                }
-                       }
-                       IFQ_DEQUEUE(&ifp->if_snd, m0);
-                       if (m != NULL) {
-                               m_freem(m0);
-                               m0 = m;
-                       }
+                               /*
+                                * Only the first segment should be put into
+                                * the concatinate pointer in this case.
+                                */
+                               pseg = 0;
+                               nptr = 1;
 
-                       /* handle unaligned part */
-                       txdaddr = MEC_TXD_ROUNDUP(dmamap->dm_segs[0].ds_addr);
-                       txs->txs_flags = MEC_TXS_TXDPTR1;
-                       unaligned =
-                           dmamap->dm_segs[0].ds_addr & (MEC_TXD_ALIGN - 1);
-                       DPRINTF(MEC_DEBUG_START,
-                           ("mec_start: ds_addr = 0x%08x, unaligned = %d\n",
-                           (u_int)dmamap->dm_segs[0].ds_addr, unaligned));
-                       if (unaligned != 0) {
-                               buflen = MEC_TXD_ALIGN - unaligned;
-                               bufoff = MEC_TXD_BUFSTART(buflen);
-                               DPRINTF(MEC_DEBUG_START,
-                                   ("mec_start: unaligned, "
-                                   "buflen = %d, bufoff = %d\n",
-                                   buflen, bufoff));
-                               memcpy(txd->txd_buf + bufoff,
-                                   mtod(m0, void *), buflen);
-                               txs->txs_flags |= MEC_TXS_TXDBUF | buflen;
-                       }
-#if 1
-                       else {
                                /*
-                                * XXX needs hardware info XXX
-                                * It seems MEC always requires some data
-                                * in txd_buf[] even if buffer is
-                                * 8-byte aligned otherwise DMA abort error
-                                * occurs later...
+                                * Set lenght of unaligned part which will be
+                                * copied into txdesc buffer.
                                 */
-                               buflen = MEC_TXD_ALIGN;
+                               buflen = MEC_TXD_ALIGN - MEC_ETHER_ALIGN;
                                bufoff = MEC_TXD_BUFSTART(buflen);
-                               memcpy(txd->txd_buf + bufoff,
-                                   mtod(m0, void *), buflen);
-                               DPRINTF(MEC_DEBUG_START,
-                                   ("mec_start: aligned, "
-                                   "buflen = %d, bufoff = %d\n",
-                                   buflen, bufoff));
-                               txs->txs_flags |= MEC_TXS_TXDBUF | buflen;
-                               txdaddr += MEC_TXD_ALIGN;
+                               resid = buflen;
+#ifdef MEC_EVENT_COUNTERS
+                               MEC_EVCNT_INCR(&sc->sc_ev_txmbuf);
+                               if (len <= 160)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txmbufa);
+                               else if (len <= 256)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txmbufb);
+                               else if (len <= 512)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txmbufc);
+                               else if (len <= 1024)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txmbufd);
+                               else
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txmbufe);
+#endif
+                       }
+#ifdef MEC_EVENT_COUNTERS
+                       else {
+                               MEC_EVCNT_INCR(&sc->sc_ev_txptrs);
+                               if (nptr == 1) {
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptr1);
+                                       if (len <= 160)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr1a);
+                                       else if (len <= 256)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr1b);
+                                       else if (len <= 512)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr1c);
+                                       else if (len <= 1024)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr1d);
+                                       else
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr1e);
+                               } else if (nptr == 2) {
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptr2);
+                                       if (len <= 160)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr2a);
+                                       else if (len <= 256)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr2b);
+                                       else if (len <= 512)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr2c);
+                                       else if (len <= 1024)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr2d);
+                                       else
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr2e);
+                               } else if (nptr == 3) {
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptr3);
+                                       if (len <= 160)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr3a);
+                                       else if (len <= 256)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr3b);
+                                       else if (len <= 512)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr3c);
+                                       else if (len <= 1024)
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr3d);
+                                       else
+                                               MEC_EVCNT_INCR(
+                                                   &sc->sc_ev_txptr3e);
+                               }
+                               if (pseg == 0)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc0);
+                               else if (pseg == 1)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc1);
+                               else if (pseg == 2)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc2);
+                               else if (pseg == 3)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc3);
+                               else if (pseg == 4)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc4);
+                               else if (pseg == 5)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc5);
+                               else
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrc6);
+                               if (buflen <= 8)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrh0);
+                               else if (buflen <= 16)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrh1);
+                               else if (buflen <= 32)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrh2);
+                               else if (buflen <= 64)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrh3);
+                               else if (buflen <= 80)
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrh4);
+                               else
+                                       MEC_EVCNT_INCR(&sc->sc_ev_txptrh5);
                        }
 #endif
-                       txdlen  = len - buflen;
-                       DPRINTF(MEC_DEBUG_START,
-                           ("mec_start: txdaddr = 0x%08llx, txdlen = %d\n",
-                           txdaddr, txdlen));
+                       m_copydata(m0, 0, buflen, txd->txd_buf + bufoff);
+
+                       IFQ_DEQUEUE(&ifp->if_snd, m0);
+                       if (m != NULL) {
+                               m_freem(m0);
+                               m0 = m;
+                       }
 
                        /*
                         * sync the DMA map for TX mbuf
-                        *
-                        * XXX unaligned part doesn't have to be sync'ed,
-                        *     but it's harmless...
                         */
-                       bus_dmamap_sync(sc->sc_dmat, dmamap, 0,
-                           dmamap->dm_mapsize, BUS_DMASYNC_PREWRITE);
+                       bus_dmamap_sync(sc->sc_dmat, dmamap, buflen,
+                           len - buflen, BUS_DMASYNC_PREWRITE);
                }
 
 #if NBPFILTER > 0
@@ -1007,11 +1316,12 @@
                if (ifp->if_bpf)
                        bpf_mtap(ifp->if_bpf, m0);
 #endif
+               MEC_EVCNT_INCR(&sc->sc_ev_txpkts);
 
                /*
                 * setup the transmit descriptor.
                 */
-               txd->txd_cmd = (len - 1);
+               txdcmd = TXCMD_BUFSTART(MEC_TXDESCSIZE - buflen) | (len - 1);
 
                /*
                 * Set MEC_TXCMD_TXINT every MEC_NTXDESC_INTR packets
@@ -1021,30 +1331,65 @@
                 */
                if (sc->sc_txpending > (MEC_NTXDESC / 2) &&
                    (nexttx & (MEC_NTXDESC_INTR - 1)) == 0)
-                       txd->txd_cmd |= MEC_TXCMD_TXINT;
+                       txdcmd |= MEC_TXCMD_TXINT;
+
+               if ((txs->txs_flags & MEC_TXS_TXDPTR) != 0) {
+                       bus_dma_segment_t *segs = dmamap->dm_segs;
 
-               if (txs->txs_flags & MEC_TXS_TXDBUF)
-                       txd->txd_cmd |= TXCMD_BUFSTART(MEC_TXDESCSIZE - buflen);
-               if (txs->txs_flags & MEC_TXS_TXDPTR1) {
-                       txd->txd_cmd |= MEC_TXCMD_PTR1;
-                       txd->txd_ptr[0] = TXPTR_LEN(txdlen - 1) | txdaddr;
+                       DPRINTF(MEC_DEBUG_TXSEGS,
+                           ("mec_start: nsegs = %d, pseg = %d, nptr = %d\n",
+                           dmamap->dm_nsegs, pseg, nptr));
+
+                       switch (nptr) {
+                       case 3:
+                               KASSERT((segs[pseg + 2].ds_addr &
+                                   MEC_TXD_ALIGNMASK) == 0);
+                               txdcmd |= MEC_TXCMD_PTR3;
+                               txd->txd_ptr[2] =
+                                   TXPTR_LEN(segs[pseg + 2].ds_len - 1) |
+                                   segs[pseg + 2].ds_addr;
+                               /* FALLTHROUGH */
+                       case 2:
+                               KASSERT((segs[pseg + 1].ds_addr &
+                                   MEC_TXD_ALIGNMASK) == 0);
+                               txdcmd |= MEC_TXCMD_PTR2;
+                               txd->txd_ptr[1] =
+                                   TXPTR_LEN(segs[pseg + 1].ds_len - 1) |
+                                   segs[pseg + 1].ds_addr;
+                               /* FALLTHROUGH */
+                       case 1:
+                               txdcmd |= MEC_TXCMD_PTR1;
+                               txd->txd_ptr[0] =
+                                   TXPTR_LEN(segs[pseg].ds_len - resid - 1) |
+                                   (segs[pseg].ds_addr + resid);
+                               break;
+                       default:
+                               panic("%s: impossible nptr in %s",
+                                   device_xname(sc->sc_dev), __func__);
+                               /* NOTREACHED */
+                       }
                        /*
                         * Store a pointer to the packet so we can
                         * free it later.
                         */
                        txs->txs_mbuf = m0;
                } else {
-                       txd->txd_ptr[0] = 0;
                        /*
                         * In this case all data are copied to buffer in txdesc,
                         * we can free TX mbuf here.
                         */
                        m_freem(m0);
                }
+               txd->txd_cmd = txdcmd;
 
                DPRINTF(MEC_DEBUG_START,
-                   ("mec_start: txd_cmd = 0x%016llx, txd_ptr = 0x%016llx\n",
-                   txd->txd_cmd, txd->txd_ptr[0]));
+                   ("mec_start: txd_cmd    = 0x%016llx\n", txd->txd_cmd));
+               DPRINTF(MEC_DEBUG_START,
+                   ("mec_start: txd_ptr[0] = 0x%016llx\n", txd->txd_ptr[0]));
+               DPRINTF(MEC_DEBUG_START,
+                   ("mec_start: txd_ptr[1] = 0x%016llx\n", txd->txd_ptr[1]));
+               DPRINTF(MEC_DEBUG_START,
+                   ("mec_start: txd_ptr[2] = 0x%016llx\n", txd->txd_ptr[2]));
                DPRINTF(MEC_DEBUG_START,
                    ("mec_start: len = %d (0x%04x), buflen = %d (0x%02x)\n",
                    len, len, buflen, buflen));
@@ -1063,6 +1408,7 @@
 
        if (sc->sc_txpending == MEC_NTXDESC - 1) {
                /* No more slots; notify upper layer. */
+               MEC_EVCNT_INCR(&sc->sc_ev_txdstall);
                ifp->if_flags |= IFF_OACTIVE;
        }
 
@@ -1100,7 +1446,7 @@
        /* release any TX buffers */
        for (i = 0; i < MEC_NTXDESC; i++) {
                txs = &sc->sc_txsoft[i];
-               if ((txs->txs_flags & MEC_TXS_TXDPTR1) != 0) {
+               if ((txs->txs_flags & MEC_TXS_TXDPTR) != 0) {
                        bus_dmamap_unload(sc->sc_dmat, txs->txs_dmamap);
                        m_freem(txs->txs_mbuf);
                        txs->txs_mbuf = NULL;
@@ -1261,6 +1607,12 @@
                                DPRINTF(MEC_DEBUG_INTR,
                                    ("mec_intr: disable TX_INT\n"));
                        }
+#ifdef MEC_EVENT_COUNTERS
+                       if ((statack & MEC_INT_TX_EMPTY) != 0)
+                               MEC_EVCNT_INCR(&sc->sc_ev_txempty);
+                       if ((statack & MEC_INT_TX_PACKET_SENT) != 0)
+                               MEC_EVCNT_INCR(&sc->sc_ev_txsent);
+#endif
                }
 
                if (statack &
@@ -1427,7 +1779,7 @@
            i = MEC_NEXTTX(i), sc->sc_txpending--) {
                txd = &sc->sc_txdesc[i];
 
-               MEC_TXDESCSYNC(sc, i,
+               MEC_TXCMDSYNC(sc, i,
                    BUS_DMASYNC_POSTREAD|BUS_DMASYNC_POSTWRITE);
 
                txstat = txd->txd_stat;
@@ -1440,7 +1792,7 @@
                }
 
                txs = &sc->sc_txsoft[i];
-               if ((txs->txs_flags & MEC_TXS_TXDPTR1) != 0) {
+               if ((txs->txs_flags & MEC_TXS_TXDPTR) != 0) {
                        dmamap = txs->txs_dmamap;
                        bus_dmamap_sync(sc->sc_dmat, dmamap, 0,
                            dmamap->dm_mapsize, BUS_DMASYNC_POSTWRITE);

---
Izumi Tsutsui




Home | Main Index | Thread Index | Old Index