Subject: Re: HyperThread Technology
To: Steven M. Bellovin <smb@research.att.com>
From: MLH <mlh@goathill.org>
List: port-i386
Date: 07/08/2004 14:43:29
> 
> In message <20040708052949.GC22720@kyyhky.embedtronics.fi>, Jukka Marin writes:
> >On Wed, Jul 07, 2004 at 09:16:34PM -0500, MLH wrote:
> >> Just missed it on current-users :
> >> 
> >> http://mail-index.netbsd.org/current-users/2004/06/26/0012.html
> >
> >Did you do a full build with the MP kernel already?  How long did it
> >take?  How long did it take with the GENERIC kernel?
> >
> >Trying to find out how useful HT is in real work.. :)
> >
> It saves me a little -- 10-15%, I think -- on 'make sets'.

In my case I see very little gain and sometimes a loss. For example,
performing the following (after cleaning each time):

./build.sh -u -U -M /opt/obj -R /opt/snapshot tools kernel=ACPI.MP &&  \
./build.sh -u -U -M /opt/obj -R /opt/snapshot tools releasekernel=ACPI.MP
with sources over nfs and obj on a local 10krpm ST3120026AS sata drive :

NetBSD bam.sfbr.org 2.0_BETA NetBSD 2.0_BETA (ACPI.MP) :
47.508u 20.102s 1:32.84 72.8%   0+0k 448+664io 2434pf+0w

NetBSD bam.sfbr.org 2.0_BETA NetBSD 2.0_BETA (GENERIC.MP) :
45.349u 17.776s 1:35.00 66.4%   0+0k 450+675io 2432pf+0w


Some benchmarks:
------------------------------------
NetBSD bam.sfbr.org 2.0_BETA NetBSD 2.0_BETA (ACPI.MP)

pystones:
This machine benchmarks at 34013.6 pystones/second

flops:
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0109   1282.1781
     2     -1.4166e-13      0.0113    616.9612
     3      4.7184e-14      0.0194    878.1818
     4     -1.2557e-13      0.0137   1093.6184
     5     -1.3800e-13      0.0264   1097.7967
     6      3.2380e-13      0.0255   1136.6024
     7     -8.4583e-11      0.0332    361.6202
     8      3.4867e-13      0.0255   1175.9014

   Iterations      =  512000000
   NullTime (usec) =     0.0004
   MFLOPS(1)       =   683.4207
   MFLOPS(2)       =   664.7419
   MFLOPS(3)       =   944.2470
   MFLOPS(4)       =  1082.0311

linpacks:
Rolled Single Precision Linpack

     norm. resid      resid           machep         x[0]-1        x[n-1]-1
       1.9        4.52336171e-05  1.19209290e-07 -1.31130219e-05 -1.30534172e-05
    times are reported for matrices of order   100
      dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201
       0.00       0.00       0.00    1046748       0.00       0.01
       0.00       0.00       0.00    1049949       0.00       0.01
       0.00       0.00       0.00    1043567       0.00       0.01
       0.00       0.00       0.00    1044996       0.00       0.01
 times for array with leading dimension of 200
       0.00       0.00       0.00    1037261       0.00       0.01
       0.00       0.00       0.00    1037260       0.00       0.01
       0.00       0.00       0.00    1032580       0.00       0.01
       0.00       0.00       0.00    1048186       0.00       0.01
Rolled Single  Precision 1044996 Kflops ; 10 Reps 

linpackd:
Rolled Double Precision Linpack

     norm. resid      resid           machep         x[0]-1        x[n-1]-1
       1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
    times are reported for matrices of order   100
      dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201
       0.00       0.00       0.00     910698       0.00       0.01
       0.00       0.00       0.00     915556       0.00       0.01
       0.00       0.00       0.00     915556       0.00       0.01
       0.00       0.00       0.00     915922       0.00       0.01
 times for array with leading dimension of 200
       0.00       0.00       0.00     897603       0.00       0.01
       0.00       0.00       0.00     856193       0.00       0.01
       0.00       0.00       0.00     842536       0.00       0.01
       0.00       0.00       0.00     843157       0.00       0.01
Rolled Double  Precision 843157 Kflops ; 10 Reps 

dry2: 10000000 runs
Microseconds for one run through Dhrystone:    0.2 
Dhrystones per Second:                      4926108.5 

------------------------------------
NetBSD bam.sfbr.org 2.0_BETA NetBSD 2.0_BETA (GENERIC.MP) :

pystones:
This machine benchmarks at 33974.3 pystones/second

flops:
   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0263    531.5738
     2     -1.4166e-13      0.0242    288.6942
     3      4.7184e-14      0.0257    662.3287
     4     -1.2557e-13      0.0258    580.8589
     5     -1.3800e-13      0.0277   1047.8477
     6      3.2380e-13      0.0269   1079.3151
     7     -8.4583e-11      0.0496    242.0390
     8      3.4867e-13      0.0266   1129.9241

   Iterations      =  512000000
   NullTime (usec) =     0.0004
   MFLOPS(1)       =   353.9761
   MFLOPS(2)       =   459.6483
   MFLOPS(3)       =   700.2343
   MFLOPS(4)       =   867.4086

Rolled Single Precision Linpack

     norm. resid      resid           machep         x[0]-1        x[n-1]-1
       1.9        4.52336171e-05  1.19209290e-07 -1.31130219e-05 -1.30534172e-05
    times are reported for matrices of order   100
      dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201
       0.00       0.00       0.00    1049949       0.00       0.01
       0.00       0.00       0.00    1049949       0.00       0.01
       0.00       0.00       0.00    1037261       0.00       0.01
       0.00       0.00       0.00    1058038       0.00       0.01
 times for array with leading dimension of 200
       0.00       0.00       0.00     959033       0.00       0.01
       0.00       0.00       0.00     957694       0.00       0.01
       0.00       0.00       0.00     953703       0.00       0.01
       0.00       0.00       0.00     963202       0.00       0.01
Rolled Single  Precision 963202 Kflops ; 10 Reps 

Rolled Double Precision Linpack

     norm. resid      resid           machep         x[0]-1        x[n-1]-1
       1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
    times are reported for matrices of order   100
      dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201
       0.00       0.00       0.00     922939       0.00       0.01
       0.00       0.00       0.00     927928       0.00       0.01
       0.00       0.00       0.00     927928       0.00       0.01
       0.00       0.00       0.00     925801       0.00       0.01
 times for array with leading dimension of 200
       0.00       0.00       0.00     884880       0.00       0.01
       0.00       0.00       0.00     894097       0.00       0.01
       0.00       0.00       0.00     892935       0.00       0.01
       0.00       0.00       0.00     892122       0.00       0.01
Rolled Double  Precision 892122 Kflops ; 10 Reps 

dry2: 10000000 runs
Microseconds for one run through Dhrystone:    0.2 
Dhrystones per Second:                      4950495.0