Subject: Re: mc* performance patch
To: Allen Briggs <briggs@netbsd.org>
From: Tim Kelly <hockey@dialectronics.com>
List: port-macppc
Date: 01/15/2005 20:52:50
Allen,
In looking at the if_mc.c and am79c950.c code, I've noticed something odd.
I had been tinkering with the code in ether_cmp (having bummed it from 12
instructions to 8) and while doing benchmarks I found that ping times on MP
kernels are 30% or so slower than ping times on SP kernels. I haven't seen
anything quantitative with respect to ftp, but it is unmistakeable with
ping:

MP kernel:

Net7300# ping 192.168.10.99
PING 192.168.10.99 (192.168.10.99): 56 data bytes
64 bytes from 192.168.10.99: icmp_seq=0 ttl=255 time=1.306 ms
64 bytes from 192.168.10.99: icmp_seq=1 ttl=255 time=1.561 ms
64 bytes from 192.168.10.99: icmp_seq=2 ttl=255 time=1.560 ms
64 bytes from 192.168.10.99: icmp_seq=3 ttl=255 time=1.556 ms
64 bytes from 192.168.10.99: icmp_seq=4 ttl=255 time=1.470 ms
64 bytes from 192.168.10.99: icmp_seq=5 ttl=255 time=1.474 ms
64 bytes from 192.168.10.99: icmp_seq=6 ttl=255 time=1.531 ms
64 bytes from 192.168.10.99: icmp_seq=7 ttl=255 time=1.536 ms
64 bytes from 192.168.10.99: icmp_seq=8 ttl=255 time=1.518 ms
64 bytes from 192.168.10.99: icmp_seq=9 ttl=255 time=1.480 ms
^X64 bytes from 192.168.10.99: icmp_seq=10 ttl=255 time=1.465 ms
64 bytes from 192.168.10.99: icmp_seq=11 ttl=255 time=1.522 ms
64 bytes from 192.168.10.99: icmp_seq=12 ttl=255 time=1.649 ms

----192.168.10.99 PING Statistics----
13 packets transmitted, 13 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.306/1.510/1.649/0.079 ms


GENERIC SP:

Net7300# ping 192.168.10.99
PING 192.168.10.99 (192.168.10.99): 56 data bytes
64 bytes from 192.168.10.99: icmp_seq=0 ttl=255 time=1.021 ms
64 bytes from 192.168.10.99: icmp_seq=1 ttl=255 time=1.097 ms
64 bytes from 192.168.10.99: icmp_seq=2 ttl=255 time=1.078 ms
64 bytes from 192.168.10.99: icmp_seq=3 ttl=255 time=0.986 ms
64 bytes from 192.168.10.99: icmp_seq=4 ttl=255 time=1.014 ms
64 bytes from 192.168.10.99: icmp_seq=5 ttl=255 time=1.078 ms
64 bytes from 192.168.10.99: icmp_seq=6 ttl=255 time=1.227 ms
64 bytes from 192.168.10.99: icmp_seq=7 ttl=255 time=1.029 ms
64 bytes from 192.168.10.99: icmp_seq=8 ttl=255 time=0.987 ms
64 bytes from 192.168.10.99: icmp_seq=9 ttl=255 time=1.180 ms
64 bytes from 192.168.10.99: icmp_seq=10 ttl=255 time=1.066 ms
64 bytes from 192.168.10.99: icmp_seq=11 ttl=255 time=1.103 ms
64 bytes from 192.168.10.99: icmp_seq=12 ttl=255 time=1.120 ms
64 bytes from 192.168.10.99: icmp_seq=13 ttl=255 time=1.077 ms
64 bytes from 192.168.10.99: icmp_seq=14 ttl=255 time=0.963 ms
64 bytes from 192.168.10.99: icmp_seq=15 ttl=255 time=1.011 ms
^C
----192.168.10.99 PING Statistics----
16 packets transmitted, 16 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.963/1.065/1.227/0.072 ms

This is repeatable, and with -current as of 12/31/04. This is without any
of my tinkering in either. The results are about the same with the
tinkering (apparently one lwz instruction is slower than two lhz ones,
although I can't see why). I built both with the same code base, and to the
best of my knowledge I haven't done any MP specific tinkering.

Any thoughts?

tim