Subject: hyperthreading performances
To: None <port-i386@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-i386
Date: 07/26/2004 12:29:03
Hi,
I've done a simple performance test of a PVI-HT, to see the benefit of
hyperthreading.
The hardware is a ASUS P4P800-SE with a 2.8GHz PVI-HT:
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Pentium 4 (686-class), 2806.50 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: calibrating local timer
cpu0: apic clock running at 200 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel Pentium 4 (686-class), 2806.37 MHz, id 0xf29
cpu1: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu1: L2 cache 512 KB 64B/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries
The test was running
/usr/bin/time ./build.sh -m i386 -U -M /dsk/l1/tmp/i386/obj -D /dsk/l1/tmp/i386/dest -R /dsk/l1/tmp/i386/rel -u -j4 release sets
from local disk. For each configuration I ran build.sh 3 times, starting
with clean obj, dest and rel directories.
I tested with NetBSD 2.0, and linux 2.6.6 (fedora core 2), with UP and
MP kernels. The box was completely idle in both cases.
Note that this is just a data point (as we've been discussing HT benefits
earlier here), but you should probably test with you applications to
see if HT is worth it for you or not.
Linux borneo 2.6.6-1.435.2.3 #1 Thu Jul 1 08:25:29 EDT 2004 i686 i686 i386 GNU/Linux
5388.94user 728.86system 6325elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
5389.38user 734.06system 6335elapsed 96%CPU (0avgtext+0avgdata 0maxresident)
5386.28user 732.59system 6328elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
Linux borneo 2.6.6-1.435.2.3smp #1 SMP Thu Jul 1 08:36:21 EDT 2004 i686 i686 i38
6 GNU/Linux
8920.41user 1636.95system 6586elapsed 160%CPU (0avgtext+0avgdata 0maxresident
8269.87user 1596.85system 6674elapsed 147%CPU (0avgtext+0avgdata 0maxresident
8507.60user 1599.09system 6849elapsed 147%CPU (0avgtext+0avgdata 0maxresident
NetBSD borneo.lip6.fr 2.0_BETA NetBSD 2.0_BETA (GENERIC) #1: Sat Jul 24 13:43:28CEST 2004 bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2-0/src/sys/arch/i386/compile/GENERIC i386
6373.03 real 5770.50 user 1062.94 sys
6378.30 real 5778.98 user 1079.16 sys
6382.06 real 5785.92 user 1072.80 sys
NetBSD borneo.lip6.fr 2.0_BETA NetBSD 2.0_BETA (P4P800) #1: Fri Jul 23 15:27:56CEST 2004 bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2-0/src/sys/arch/i386/compile/P4P800 i386
5836.43 real 9696.62 user 2089.52 sys
5875.52 real 9714.28 user 2120.03 sys
5847.41 real 9693.73 user 2114.53 sys
First we can see that in the UP case, linux and NetBSD give similar results
(NetBSD is consistently a little behind, but it's probably an unfair
comparaison as the build.sh include building the tools). With MP enabled,
linux is consistently a bit slower, by about 5.9%.
NetBSD is consistently a bit faster, by 8.2%.
In both case, the CPU user and system time grows by a hunge amout.
In both case the CPU user time grows by about 1.6, but the system time grows
much more for linux (2.2) than NetBSD (1.9). This may be the reason why
NetBSD see an improvement with HT while linux sees a slowdown, and I suspect
this is related to fine-grained locking vs biglock.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--