Subject: wm(4) with i82544EI on the AlphaServer ES40...
To: Jason Thorpe <thorpej@shagadelic.org>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 11/29/2004 19:02:57
OK, I've now finally got the newer wm(4) card installed in the es40:

[console]<@> # dmesg | fgrep -e wm1 -e ukphy0
wm1 at pci1 dev 3 function 0: Intel i82544EI 1000BASE-T Ethernet, rev. 2
wm1: interrupting at dec 6600 irq 32
wm1: 64-bit 33MHz PCI bus
wm1: 64 word (6 address bits) MicroWire EEPROM
wm1: Ethernet address 00:0e:0c:5a:ee:9d
ukphy0 at wm1 phy 1: Generic IEEE 802.3u media interface
ukphy0: Marvell 88E1000 Gigabit PHY (OUI 0x000ac2, model 0x0003), rev. 0
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto

(I still have to add the macphy(4) (Marvell) driver to properly match
the actual PHY on this card, but ukphy(4) seems to work sufficiently
well with this card for now....)


[console]<@> # ifconfig wm1
wm1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
        enabled=0<>
        address: 00:0e:0c:5a:ee:9d
        media: Ethernet autoselect (1000baseT full-duplex)
        status: active
        inet 10.10.10.2 netmask 0xff000000 broadcast 10.255.255.255

Here are some ttcp tests:

Note these are with yesterday's -current, and prior to the patches to
if_wm.c that you posted.  I'll add those patches next so I can do
another comparison and see if the degredation I observed yesterday is
repeatable....

[[ the other machine is a very FAST dual-CPU AMD Opteron 1.8GHz FreeBSD
box with an i82546 card hooked back-to-back with this one and otherwise
sitting idle ]]

[console]<@> # ttcp -v -t -s -n 50000 10.10.10.1
ttcp-t: buflen=8192, nbuf=50000, align=16384/0, port=5001  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 409600000 bytes in 16.97 real seconds = 23572.99 KB/sec +++
ttcp-t: 409600000 bytes in 16.97 CPU seconds = 23576.09 KB/cpu sec
ttcp-t: 50000 I/O calls, msec/call = 0.35, calls/sec = 2946.62
ttcp-t: 0.0user 16.9sys 0:16real 100% 0i+0d 0maxrss 0+50001pf 6+187csw
ttcp-t: buffer address 0x20050000
[console]<@> # ttcp -v -r -s                    
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001  tcp
ttcp-r: socket
ttcp-r: accept from 10.10.10.1
ttcp-r: 409600000 bytes in 14.34 real seconds = 27900.75 KB/sec +++
ttcp-r: 409600000 bytes in 14.13 CPU seconds = 28302.79 KB/cpu sec
ttcp-r: 50023 I/O calls, msec/call = 0.29, calls/sec = 3489.20
ttcp-r: 0.0user 14.0sys 0:14real 98% 0i+0d 0maxrss 0+1pf 40+156csw
ttcp-r: buffer address 0x20050000


Now to turn on the fabled hardware checksum offloading:

[console]<@> # ifconfig wm1 ip4csum tcp4csum udp4csum

[console]<@> # ttcp -v -t -s -n 50000 10.10.10.1     
ttcp-t: buflen=8192, nbuf=50000, align=16384/0, port=5001  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: connect
ttcp-t: 409600000 bytes in 13.78 real seconds = 29019.69 KB/sec +++
ttcp-t: 409600000 bytes in 9.18 CPU seconds = 43555.15 KB/cpu sec
ttcp-t: 50000 I/O calls, msec/call = 0.28, calls/sec = 3627.46
ttcp-t: 0.0user 9.1sys 0:13real 66% 0i+0d 0maxrss 0+50001pf 7+151csw
ttcp-t: buffer address 0x20050000
[console]<@> # ttcp -v -r -s                         
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001  tcp
ttcp-r: socket
ttcp-r: accept from 10.10.10.1
ttcp-r: 409600000 bytes in 11.10 real seconds = 36034.49 KB/sec +++
ttcp-r: 409600000 bytes in 10.85 CPU seconds = 36875.54 KB/cpu sec
ttcp-r: 50063 I/O calls, msec/call = 0.23, calls/sec = 4509.99
ttcp-r: -1.8user 10.9sys 0:11real 97% 0i+0d 0maxrss 0+1pf 98+123csw
ttcp-r: buffer address 0x20050000


Hmmm... much better CPU load on xmit

However though the receive speed is noticably faster it's still
apparently max'ed out on CPU despite being a quad-CPU system (though I
wonder what's up with the "-1.8user" CPU seconds!)

Still nowhere near wire speed though....  (maybe if I bumped the MTU,
but that's not really what I want to test)

Now for some different sized sockbuf and buflen values just for fun....

[console]<@> # ttcp -b 32768 -v -t -s -n 50000 10.10.10.1
ttcp-t: buflen=8192, nbuf=50000, align=16384/0, port=5001, sockbufsize=32768  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: connect
ttcp-t: 409600000 bytes in 13.67 real seconds = 29254.17 KB/sec +++
ttcp-t: 409600000 bytes in 9.53 CPU seconds = 41957.52 KB/cpu sec
ttcp-t: 50000 I/O calls, msec/call = 0.28, calls/sec = 3656.77
ttcp-t: 0.0user 9.5sys 0:13real 69% 0i+0d 0maxrss 0+50001pf 7+150csw
ttcp-t: buffer address 0x20050000
[console]<@> # ttcp -l 16384 -b 65536 -v -t -s -n 50000 10.10.10.1
ttcp-t: buflen=16384, nbuf=50000, align=16384/0, port=5001, sockbufsize=65536  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: connect
ttcp-t: 819200000 bytes in 24.55 real seconds = 32583.91 KB/sec +++
ttcp-t: 819200000 bytes in 16.27 CPU seconds = 49159.48 KB/cpu sec
ttcp-t: 50000 I/O calls, msec/call = 0.50, calls/sec = 2036.49
ttcp-t: 0.0user 16.2sys 0:24real 66% 0i+0d 0maxrss 0+100003pf 3+270csw
ttcp-t: buffer address 0x20050000
[console]<@> # ttcp -l 32768 -b 65536 -v -t -s -n 50000 10.10.10.1
ttcp-t: buflen=32768, nbuf=50000, align=16384/0, port=5001, sockbufsize=65536  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: connect
ttcp-t: 1638400000 bytes in 46.42 real seconds = 34469.37 KB/sec +++
ttcp-t: 1638400000 bytes in 26.19 CPU seconds = 61095.97 KB/cpu sec
ttcp-t: 50000 I/O calls, msec/call = 0.95, calls/sec = 1077.17
ttcp-t: 0.0user 26.1sys 0:46real 56% 0i+0d 0maxrss 0+200008pf 16+513csw
ttcp-t: buffer address 0x20050000
[console]<@> # ttcp -l 65536 -b 65536 -v -t -s -n 50000 10.10.10.1
ttcp-t: buflen=65536, nbuf=50000, align=16384/0, port=5001, sockbufsize=65536  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: connect
ttcp-t: 3276800000 bytes in 124.62 real seconds = 25677.86 KB/sec +++
ttcp-t: 3276800000 bytes in 115.74 CPU seconds = 27647.88 KB/cpu sec
ttcp-t: 50000 I/O calls, msec/call = 2.55, calls/sec = 401.22
ttcp-t: 0.0user 115.6sys 2:04real 92% 0i+0d 0maxrss 0+400559pf 47274+876csw
ttcp-t: buffer address 0x20050000
[console]<@> # ttcp -l 65536 -b 131072 -v -t -s -n 10000 10.10.10.1
ttcp-t: buflen=65536, nbuf=10000, align=16384/0, port=5001, sockbufsize=131072  tcp  -> 10.10.10.1
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: connect
ttcp-t: 655360000 bytes in 17.79 real seconds = 35983.45 KB/sec +++
ttcp-t: 655360000 bytes in 10.82 CPU seconds = 59146.13 KB/cpu sec
ttcp-t: 10000 I/O calls, msec/call = 1.82, calls/sec = 562.24
ttcp-t: 0.0user 10.8sys 0:17real 60% 0i+0d 0maxrss 0+80009pf 3+196csw
ttcp-t: buffer address 0x20050000


-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>