Subject: improving ssh performance on sun4m systems
To: None <netbsd-users@netbsd.org>
From: Bill Sommerfeld <sommerfeld@netbsd.org>
List: port-sparc
Date: 03/09/2002 17:40:14
Recently there was a post to netbsd-users regarding poor
rsa/ssh/etc. performance on sun4m systems.

sun4m and later systems support version 8 of the sparc architecture
set.  among other things, this includes an integer multiply
instruction.  

when i recently upgraded an SS10 to a new version of ssh, I noticed
really bad performance; then I realized that my most recent build had
left out the -mv8 option.. so I built a v8 version of libcrypto, ran
some A/B tests, and saw a 10x improvement for some RSA operations.  

I did this with the following in /etc/mk.conf:

DBG= -O1 -mcpu=v8 -mtune=supersparc

[note: -mcpu=v8 means the resulting code will not run on sparc v7 and
older systems.]

% dmesg | grep MHz
cpu0 at mainbus0: TMS390Z50 v0 or TMS390Z55 @ 40.300 MHz, on-chip
FPU
sbus0 at iommu0: clock = 20 MHz
esp0 at dma0 slot 15 offset 0x800000 level 4: ESP200, 40MHz, SCSI ID 7
% openssl speed dsa
To get the most accurate results, try to run this
program when this computer is idle.
Doing 512 bit sign dsa's for 10s: 84 512 bit DSA signs in 9.78s
Doing 512 bit verify dsa's for 10s: 70 512 bit DSA verify in 9.91s
Doing 1024 bit sign dsa's for 10s: 25 1024 bit DSA signs in 9.88s
Doing 1024 bit verify dsa's for 10s: 20 1024 bit DSA verify in 9.80s
OpenSSL 0.9.6b 9 Jul 2001
built on: Sat Mar  9 14:50:14 EST 2002
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,16,long)
blowfish(ptr) 
compiler: /d2/tools/bin/sparc--netbsdelf-gcc -O1 -mcpu=v8
-mtune=supersparc   -Werror
                  sign    verify    sign/s verify/s
dsa  512 bits   0.1165s   0.1416s      8.6      7.1
dsa 1024 bits   0.3952s   0.4900s      2.5      2.0

% openssl speed rsa
To get the most accurate results, try to run this
program when this computer is idle.
Doing 512 bit private rsa's for 10s: 83 512 bit private RSA's in 9.75s
Doing 512 bit public rsa's for 10s: 847 512 bit public RSA's in 9.76s
Doing 1024 bit private rsa's for 10s: 14 1024 bit private RSA's in
10.05s
Doing 1024 bit public rsa's for 10s: 244 1024 bit public RSA's in
9.77s
Doing 2048 bit private rsa's for 10s: 3 2048 bit private RSA's in
14.52s
Doing 2048 bit public rsa's for 10s: 67 2048 bit public RSA's in 9.79s
Doing 4096 bit private rsa's for 10s: 1 4096 bit private RSA's in
34.00s
Doing 4096 bit public rsa's for 10s: 19 4096 bit public RSA's in
10.30s
OpenSSL 0.9.6b 9 Jul 2001
built on: Sat Mar  9 14:50:14 EST 2002
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,16,long)
blowfish(ptr) 
compiler: /d2/tools/bin/sparc--netbsdelf-gcc -O1 -mcpu=v8
-mtune=supersparc   -Werror
                  sign    verify    sign/s verify/s
rsa  512 bits   0.1175s   0.0115s      8.5     86.8
rsa 1024 bits   0.7179s   0.0401s      1.4     25.0
rsa 2048 bits   4.8405s   0.1462s      0.2      6.8
rsa 4096 bits  34.0042s   0.5423s      0.0      1.8

					- Bill