Subject: Re: CVS commit: htdocs/Ports/sgimips
To: None <port-sgimips@netbsd.org>
From: Pavel Cahyna <pcah8322@artax.karlin.mff.cuni.cz>
List: port-sgimips
Date: 10/18/2005 15:26:17
I tested the speed of a kernel with L2 cache patch built by Tsutsui-san.
Results show that the patch probably works as expected, thanks!

openssl speed shows some improvement for the RSA numbers:

----------------------------------
# 2.0.2 kernel (3.99.5 is similar)
OpenSSL 0.9.7d 17 Mar 2004
built on: NetBSD 2.0.2
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: gcc version 3.3.3 (NetBSD nb3 20040520)
available timing options: USE_TOD HZ=100 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                 69.70k      143.32k      195.15k      214.54k      221.09k
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4                703.75k     2391.73k     6547.48k    11574.18k    14898.21k
md5                527.37k     1401.56k     3975.47k     7671.08k    10756.88k
hmac(md5)          871.60k     2677.14k     6291.70k     9327.22k    11114.04k
sha1               357.43k     1067.68k     2758.33k     4567.58k     5641.01k
rmd160             326.35k      534.62k     1567.75k     3044.26k     4197.30k
rc4               6434.60k     7242.23k     7477.01k     7537.17k     7551.68k
des cbc           1186.65k     1230.35k     1242.24k     1244.24k     1244.93k
des ede3           436.07k      442.44k      444.20k      444.52k      444.39k
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc           1358.28k     1418.57k     1434.42k     1437.14k     1438.46k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc      2928.43k     3223.89k     3308.59k     3329.24k     3322.73k
cast cbc          1850.39k     1963.51k     1994.23k     2002.64k     2000.04k
aes-128 cbc       1629.00k     1676.46k     1688.07k     1690.13k     1687.87k
aes-192 cbc       1409.06k     1444.35k     1453.58k     1455.16k     1452.97k
aes-256 cbc       1239.85k     1265.87k     1273.16k     1274.87k     1271.86k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0417s   0.0039s     24.0    255.5
rsa 1024 bits   0.1941s   0.0109s      5.2     91.3
rsa 2048 bits   1.1627s   0.0368s      0.9     27.2
rsa 4096 bits   7.9672s   0.1341s      0.1      7.5
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0311s   0.0384s     32.2     26.0
dsa 1024 bits   0.0955s   0.1182s     10.5      8.5
dsa 2048 bits   0.3361s   0.4188s      3.0      2.4
----------------------------------


----------------------------------
# patched kernel:

OpenSSL 0.9.7d 17 Mar 2004
built on: NetBSD 2.0.2
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: gcc version 3.3.3 (NetBSD nb3 20040520)
available timing options: USE_TOD HZ=100 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                 69.15k      143.24k      195.12k      214.57k      221.03k
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4                706.21k     2396.16k     6560.78k    11590.58k    14922.89k
md5                579.50k     1789.38k     4859.34k     8511.63k    10968.92k
hmac(md5)          871.43k     2679.93k     6295.50k     9475.68k    11146.05k
sha1               470.51k     1398.36k     3258.44k     4885.01k     5715.31k
rmd160             421.80k      945.10k     2310.91k     3616.55k     4331.32k
rc4               6437.29k     7235.76k     7477.91k     7539.55k     7548.35k
des cbc           1186.88k     1231.00k     1242.52k     1244.64k     1245.89k
des ede3           436.66k      443.27k      445.06k      445.38k      445.08k
idea cbc             0.00         0.00         0.00         0.00         0.00 
rc2 cbc           1360.00k     1418.41k     1434.55k     1438.57k     1440.47k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc      2929.79k     3227.36k     3309.88k     3331.39k     3327.69k
cast cbc          1853.36k     1965.40k     1996.68k     2004.73k     2005.90k
aes-128 cbc       1625.68k     1672.40k     1684.36k     1687.42k     1687.83k
aes-192 cbc       1408.43k     1443.85k     1452.79k     1455.04k     1453.73k
aes-256 cbc       1241.13k     1268.60k     1275.22k     1277.42k     1277.28k
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0318s   0.0034s     31.5    294.2
rsa 1024 bits   0.1725s   0.0103s      5.8     97.0
rsa 2048 bits   1.1174s   0.0360s      0.9     27.8
rsa 4096 bits   7.8438s   0.1326s      0.1      7.5
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0272s   0.0333s     36.7     30.1
dsa 1024 bits   0.0907s   0.1116s     11.0      9.0
dsa 2048 bits   0.3255s   0.4121s      3.1      2.4
----------------------------------


iperf also shows notable improvements:

# 3.99.5 kernel:
------------------------------------------------------------
Client connecting to pc111, TCP port 5001
TCP window size: 17.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  3] local xxx.xxx.xxx.xxx port 65534 connected with xxx.xxx.xxx.xxx port 5001
[  3]  0.0-100.0 sec    320 MBytes  26.9 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte
------------------------------------------------------------
[  4] local xxx.xxx.xxx.xxx port 5001 connected with xxx.xxx.xxx.xxx port 2024
[  4]  0.0-10.0 sec  49.8 MBytes  41.8 Mbits/sec

# patched kernel:
------------------------------------------------------------
Client connecting to pc111, TCP port 5001
TCP window size: 17.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  3] local xxx.xxx.xxx.xxx port 65531 connected with xxx.xxx.xxx.xxx port 5001
[  3]  0.0-100.0 sec    410 MBytes  34.4 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte
------------------------------------------------------------
[  4] local xxx.xxx.xxx.xxx port 5001 connected with xxx.xxx.xxx.xxx port 2022
[  4]  0.0-10.0 sec  60.8 MBytes  51.0 Mbits/sec
[  4] local xxx.xxx.xxx.xxx port 5001 connected with xxx.xxx.xxx.xxx port 2023
[  4]  0.0-10.0 sec  60.8 MBytes  50.9 Mbits/sec

here is the difference of dmesg, which shows one probably unrelated
improvement: the setting for SCSI devices are better.

--- netbsd/O2-3.99.5.good	2005-10-18 15:00:53.000000000 +0200
+++ netbsd/O2-3.99.9.cache.good	2005-10-18 15:18:57.000000000 +0200
@@ -3,8 +3,8 @@
 Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.
 
-NetBSD 3.99.5 (GENERIC32_IP3x) #0: Tue May 31 00:58:19 UTC 2005
-	builds@works.netbsd.org:/home/builds/ab/HEAD/sgimips/200505290000Z-obj/home/builds/ab/HEAD/src/sys/arch/sgimips/compile/GENERIC32_IP3x
+NetBSD 3.99.9 (GENERIC32_IP3x) #4: Tue Oct 18 08:16:26 JST 2005
+	tsutsui@mirage:/usr/src/sys/arch/sgimips/compile/GENERIC32_IP3x
 total memory = 256 MB
 (3716 KB reserved for ARCS)
 avail memory = 242 MB
@@ -12,7 +12,7 @@
 cpu0 at mainbus0: MIPS R5000 CPU (0x2321) Rev. 2.1 with built-in FPU Rev. 1.0
 cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
 cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache
-cpu0: 512KB/32B direct-mapped write-through L2 Data cache
+cpu0: 512KB/32B direct-mapped write-through L2 Unified cache
 crime0 at mainbus0 addr 0x14000000: rev 1.1 (CRIME_ID: a1)
 mace0 at mainbus0 addr 0x1f000000
 lpt0 at mace0 offset 0x380000 intr 4 intrmask 0xf0000
@@ -41,22 +41,24 @@
 mace: established interrupt 8 (level 0)
 ahc0: interrupting at crime interrupt 8
 ahc0: Using left over BIOS settings
-ahc0: aic7880: Wide Channel A, SCSI Id=0, 16/253 SCBs
+ahc0: Host Adapter Bios disabled.  Using default SCSI device parameters
+ahc0: aic7880: Wide Channel A, SCSI Id=7, 16/253 SCBs
 scsibus0 at ahc0: 16 targets, 8 luns per target
 ahc1 at pci0 dev 2 function 0: Adaptec aic7880 Ultra SCSI adapter
 mace: established interrupt 9 (level 0)
 ahc1: interrupting at crime interrupt 9
 ahc1: Using left over BIOS settings
-ahc1: aic7880: Wide Channel A, SCSI Id=0, 16/253 SCBs
+ahc1: Host Adapter Bios disabled.  Using default SCSI device parameters
+ahc1: aic7880: Wide Channel A, SCSI Id=7, 16/253 SCBs
 scsibus1 at ahc1: 16 targets, 8 luns per target
 biomask 07 netmask 07 ttymask 07 clockmask 87
 scsibus0: waiting 2 seconds for devices to settle...
 scsibus1: waiting 2 seconds for devices to settle...
 sd0 at scsibus0 target 1 lun 0: <SGI, IBM  DCHS04Y, 3030> disk fixed
 sd0: 4340 MB, 6077 cyl, 9 head, 162 sec, 512 bytes/sect x 8888543 sectors
-sd0: async, 8-bit transfers, tagged queueing
+sd0: sync (100.00ns offset 8), 16-bit (20.000MB/s) transfers, tagged queueing
 cd0 at scsibus0 target 4 lun 0: <TOSHIBA, CD-ROM XM-5401TA, 3605> cdrom removable
-cd0: async, 8-bit transfers
+cd0: sync (236.00ns offset 15), 8-bit (4.237MB/s) transfers
 boot device: sd0
 root on sd0a dumps on sd0b
 root file system type: ffs