Subject: network backend improvements
To: None <port-xen@NetBSD.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 10/03/2005 00:11:16
Hi,
I've commited changes to the network backend and frontend, which reduce the
number of hypercalls and interrupts, and avoids some unneeded copy when
packets are sent/received. 
I did some performances tests on a dual-CPU 350Mhz Pentium II:
cpu0: Intel Pentium II (686-class), 350.80 MHz, id 0x652
cpu0: features 183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 183fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
cpu0: features 183fbff<FXSR>
cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu0: L2 cache 512 KB 32B/line 4-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: 32 page colors

I used ttcp in a routing environnement (no bridge, dom0 is used as a router).
The linux domU is running:
Linux sl4 2.6.11.10-xenU #1 Sun May 22 11:42:16 BST 2005 i686 i686 i386 GNU/Linux
The NetBSD domU is running an up-to-date current kernel.

With ipf enabled (no rules loaded):
				before changes	after changes
netbsd-domU -> dom0             13130 KB/sec    10972 KB/sec
dom0 -> netbsd-domU             10076 KB/sec    11172 KB/sec
linux-domU -> dom0              12786 KB/sec    12530 KB/sec
dom0 -> linux-domU              11978 KB/sec    13817 KB/sec
netbsd-domU -> linux-domU        9712 KB/sec    9722 KB/sec
linux-domU -> netbsd-domU        7710 KB/sec    8160 KB/sec

With ipf disabled:
				before changes	after changes
netbsd-domU -> dom0             14357 KB/sec    17745 KB/sec
dom0 -> netbsd-domU             13113 KB/sec    15175 KB/sec
linux-domU -> dom0              16239 KB/sec    22179 KB/sec
dom0 -> linux-domU              18839 KB/sec    20005 KB/sec
netbsd-domU -> linux-domU       11122 KB/sec    12307 KB/sec
linux-domU -> netbsd-domU        7369 KB/sec    12647 KB/sec

The poor results with ipf enabled is because it will cause the packet
to be copied when it comes from the network backend (the first thing
done in the ipf input path is m_makewritable(), and it looks like
m_makewritable() is less efficient than memcpy()). With ipf disabled, the
performance gain is appreciable.
I also did some minimal tests in a bridge setup with ttcp between a domU and
an external box (in which case there should be no packet copied in domain0
at all), the system CPU usage in domain0 shown by top is reduced by about
20%.

Note that to take full avantage of the changes I did in dom0, you have to
build your dom0 kernel with options MCLSHIFT=12 (do a full clean, this option
isn't defopt'ed). This cause the mbuf cluster storage to be exactly one
page (as opposed to half a page with the default value).

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--