Subject: Re: 80Mbps routing with Micrel KS8695
To: Jason Thorpe <thorpej@shagadelic.org>
From: Jesse Off <joff@embeddedARM.com>
List: port-arm
Date: 01/16/2005 19:27:40
Okay, I got the epe driver working marginally faster.  It now receives via
ttcp at:

# ttcp -r -s
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001  tcp
ttcp-r: socket
ttcp-r: accept from 10.0.0.1
ttcp-r: 67108864 bytes in 17.58 real seconds = 3728.55 KB/sec +++
ttcp-r: 8208 I/O calls, msec/call = 2.19, calls/sec = 466.98
ttcp-r: 0.2user 8.5sys 0:17real 50% 0i+0d 0maxrss 0+2pf 19+225csw

and transmits at:
# ttcp -t -s -n8192 10.0.0.1
ttcp-t: buflen=8192, nbuf=8192, align=16384/0, port=5001  tcp  -> 10.0.0.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 67108864 bytes in 20.51 real seconds = 3195.07 KB/sec +++
ttcp-t: 8192 I/O calls, msec/call = 2.56, calls/sec = 399.38
ttcp-t: 0.3user 18.1sys 0:20real 90% 0i+0d 0maxrss 0+16386pf 21+265csw

I notice the involuntary context switches are quite high at 265 / 20
seconds.  IIRC, context switches are very expensive on arm due to the
virtually-indexed L1 cache flushing.  Is the pagedaemon kernel thread the
one needing to run this often and therefore competing with ttcp?

Changes I made:
  * bypass bus_space() and go straight to hardware using volatile *s
  * "shot-gun" TX approach.. no TX intrs being used
  * bypass most dmamap_sync() and use DMA_COHERENT mappings

Needless to say, I was a little disappointed to only get 350KB/s extra,
(Linux 2.4 is almost 2x faster than NetBSD on this benchmark and gets
6600KB/s) so I ran a profiling kernel (should have done this in the first
place).  I put the profile output on
http://www.embeddedx86.com/~joff/gmon.txt for single kttcp -t run. 
Interestingly, splx() was accounted the most time.  I'm not sure if this
is an artifact of pending irqs being accounted to splx() though.

The profile output seems to reinforce the fact that the TCP/IP stack and
pool subsystem are the bottlenecks on this 200Mhz ARM and that little is
gained in fine-tuning the ethernet driver sources for this purpose.

I also tried using 65000 byte UDP packets and was able to transmit at
8500KB/s

//Jesse Off


> On Jan 15, 2005, at 9:42 PM, Toru Nishimura wrote:
>
>> - sys/arch/mips/alchemy/dev/if_aumac.c
>> There three are mostly made by the same person and show the
>> way how NetBSD netif driver should be constructed.  I recommend
>> you to rewrite your epe.c driver.  Since the HW is ARM SoC dependent
>> you could be benefited to "break" NetBSD portablity for performance
>> tradeoff.
>
> Indeed.  I made that assumption with the aumac driver.
>
>          -- Jason R. Thorpe <thorpej@shagadelic.org>
>
>