Subject: hyperthreading performances
To: None <port-i386@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-i386
Date: 07/26/2004 12:29:03
Hi,
I've done a simple performance test of a PVI-HT, to see the benefit of
hyperthreading.
The hardware is a ASUS P4P800-SE with a 2.8GHz PVI-HT:
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Pentium 4 (686-class), 2806.50 MHz, id 0xf29
cpu0: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu0: L2 cache 512 KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: calibrating local timer
cpu0: apic clock running at 200 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: Intel Pentium 4 (686-class), 2806.37 MHz, id 0xf29
cpu1: features bfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features bfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features bfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: I-cache 12K uOp cache 8-way, D-cache 8 KB 64B/line 4-way
cpu1: L2 cache 512 KB 64B/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries

The test was running
/usr/bin/time ./build.sh -m i386 -U -M /dsk/l1/tmp/i386/obj -D /dsk/l1/tmp/i386/dest -R /dsk/l1/tmp/i386/rel -u -j4 release sets
from local disk. For each configuration I ran build.sh 3 times, starting
with clean obj, dest and rel directories.
I tested with NetBSD 2.0, and linux 2.6.6 (fedora core 2), with UP and
MP kernels. The box was completely idle in both cases.

Note that this is just a data point (as we've been discussing HT benefits
earlier here), but you should probably test with you applications to
see if HT is worth it for you or not.

Linux borneo 2.6.6-1.435.2.3 #1 Thu Jul 1 08:25:29 EDT 2004 i686 i686 i386 GNU/Linux
5388.94user 728.86system 6325elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
5389.38user 734.06system 6335elapsed 96%CPU (0avgtext+0avgdata 0maxresident)
5386.28user 732.59system 6328elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k

Linux borneo 2.6.6-1.435.2.3smp #1 SMP Thu Jul 1 08:36:21 EDT 2004 i686 i686 i38
6 GNU/Linux
8920.41user 1636.95system 6586elapsed 160%CPU (0avgtext+0avgdata 0maxresident
8269.87user 1596.85system 6674elapsed 147%CPU (0avgtext+0avgdata 0maxresident
8507.60user 1599.09system 6849elapsed 147%CPU (0avgtext+0avgdata 0maxresident

NetBSD borneo.lip6.fr 2.0_BETA NetBSD 2.0_BETA (GENERIC) #1: Sat Jul 24 13:43:28CEST 2004  bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2-0/src/sys/arch/i386/compile/GENERIC i386
     6373.03 real      5770.50 user      1062.94 sys
     6378.30 real      5778.98 user      1079.16 sys
     6382.06 real      5785.92 user      1072.80 sys

NetBSD borneo.lip6.fr 2.0_BETA NetBSD 2.0_BETA (P4P800) #1: Fri Jul 23 15:27:56CEST 2004  bouyer@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2-0/src/sys/arch/i386/compile/P4P800 i386
     5836.43 real      9696.62 user      2089.52 sys
     5875.52 real      9714.28 user      2120.03 sys
     5847.41 real      9693.73 user      2114.53 sys

First we can see that in the UP case, linux and NetBSD give similar results
(NetBSD is consistently a little behind, but it's probably an unfair
comparaison as the build.sh include building the tools). With MP enabled,
linux is consistently a bit slower, by about 5.9%.
NetBSD is consistently a bit faster, by 8.2%.
In both case, the CPU user and system time grows by a hunge amout.
In both case the CPU user time grows by about 1.6, but the system time grows
much more for linux (2.2) than NetBSD (1.9). This may be the reason why
NetBSD see an improvement with HT while linux sees a slowdown, and I suspect
this is related to fine-grained locking vs biglock.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--