Subject: Re: ssh hangs for a while waiting to connect
To: Aaron J. Grier <agrier@poofygoof.com>
From: Johan A. van Zanten <johan@ewranglers.com>
List: netbsd-users
Date: 03/07/2002 15:18:12
---In message <20020306174138.U29114@goldberry.poofy.goof.com>

>the problem has shown up in other ports (68k and vax).  based on my
>limited testing, I posit it's an optimization problem.
>
>there's a speed benchmark program (speed.c) in
>src/crypto/dist/openssl/apps if anybody else wants to take a closer
>look, or verify my results.

Very cool!  Thanks for pointing this out.  

I did some quick perf testing using openssl on my Stock NetBSD 1.5.2 box
(brahma), and compared it with an Ultra-30 running Solaris 8 (ragno),
where i built everything (openssl, openssh) myself from source.

 host	        dsa signs	   rsa signs		des
 name	      512bits  1024bits	   512bits 1024bits   cbc-1KB    ede-1KB
 --------------------------------------------------------------------------
 brahma	        8.0	 2.3	     8.8     2.3	1047.21k   369.32k 
 ragno	      125.6	38.9	   117.5    20.7	3580.38k  1268.39k


  Ragno X Faster than Brahma
	       15.70	16.91	    13.35   9.00	 3.40	 3.43

A 170 MHz microSPARC-II is a pretty respectable CPU in my mind, even
compared with the 300MHz UltraSPARC-II in the U30.  I would expect under
most circumstances* the 300 MHz US-II to be about 3-4 times faster than
the 170 MHz mS-II for integer perf. and 5-6 times faster for floating
point under conditions where software has been optimized for both
processors.  In all the categories above, *except* des, there's a larger
discrepancy (especially large in dsa), so i think you're right: it's a
compile time optimization issue with OpenSSL.


 The next step for me is to compile, install and re-test OpenSSL on
NetBSD. Unforunately, OpenSSL wants Perl 5, which makes this a little more
time consuming. (Because there's no perl in the default NetBSD 1.5.2
installation)

*The mS-II has 512KB L2 cache, whereas this US-II has 2 MB, so there could
be some very big perf differences in tests that were heavily affect by the
difference in cache size.

 ----------------------------------------------------------------------


 Detailed results:

 (This is a SPARC-5/170)

brahma:~ $ uname -mnrs 
NetBSD brahma 1.5.2 sparc
brahma:~ $ openssl
OpenSSL> speed dsa   
Doing 512 bit sign dsa's for 10s: 80 512 bit DSA signs in 10.50s
Doing 512 bit verify dsa's for 10s: 63 512 bit DSA verify in 9.97s
Doing 1024 bit sign dsa's for 10s: 23 1024 bit DSA signs in 10.70s
Doing 1024 bit verify dsa's for 10s: 19 1024 bit DSA verify in 10.39s
OpenSSL 0.9.5a 1 Apr 2000
built on: Sun Aug 19 19:35:28 CST 2001
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx) 
compiler: cc -O2  -Werror 
                  sign    verify    sign/s verify/s
dsa  512 bits   0.1256s   0.1583s      8.0      6.3
dsa 1024 bits   0.4378s   0.5468s      2.3      1.8

OpenSSL> speed rsa
Doing 512 bit private rsa's for 10s: 88 512 bit private RSA's in 9.98s
Doing 512 bit public rsa's for 10s: 759 512 bit public RSA's in 10.00s
Doing 1024 bit private rsa's for 10s: 14 1024 bit private RSA's in 10.41s
Doing 1024 bit public rsa's for 10s: 216 1024 bit public RSA's in 10.20s
Doing 2048 bit private rsa's for 10s: 2 2048 bit private RSA's in 10.42s
Doing 2048 bit public rsa's for 10s: 59 2048 bit public RSA's in 10.90s
Doing 4096 bit private rsa's for 10s: 1 4096 bit private RSA's in 37.97s
Doing 4096 bit public rsa's for 10s: 16 4096 bit public RSA's in 10.28s
OpenSSL 0.9.5a 1 Apr 2000
built on: Sun Aug 19 19:35:28 CST 2001
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx) 
compiler: cc -O2  -Werror 
                  sign    verify    sign/s verify/s
rsa  512 bits   0.1134s   0.0132s      8.8     75.9
rsa 1024 bits   0.7436s   0.0464s      1.3     21.6
rsa 2048 bits   5.2100s   0.1710s      0.2      5.8
rsa 4096 bits  37.9700s   0.6425s      0.0      1.6


OpenSSL> speed des
Doing des cbc for 3s on 8 size blocks: 363066 des cbc's in 3.10s
Doing des cbc for 3s on 64 size blocks: 48360 des cbc's in 3.10s
Doing des cbc for 3s on 256 size blocks: 12182 des cbc's in 2.98s
Doing des cbc for 3s on 1024 size blocks: 3068 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 384 des cbc's in 3.10s
Doing des ede3 for 3s on 8 size blocks: 134258 des ede3's in 2.99s
Doing des ede3 for 3s on 64 size blocks: 17194 des ede3's in 3.10s
Doing des ede3 for 3s on 256 size blocks: 4327 des ede3's in 3.10s
Doing des ede3 for 3s on 1024 size blocks: 1082 des ede3's in 3.00s
Doing des ede3 for 3s on 8192 size blocks: 136 des ede3's in 3.10s
OpenSSL 0.9.5a 1 Apr 2000
built on: Sun Aug 19 19:35:28 CST 2001
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx) 
compiler: cc -O2  -Werror 
The 'numbers' are in 1000s of bytes per second processed.
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des cbc            964.96k     1028.25k     1046.51k     1047.21k     1045.09k
des ede3           359.22k      365.59k      368.01k      369.32k      370.14k
OpenSSL> 


 ----------------------------------------------------------------------
This is a Ultra-30 (300 MHz):

 $ uname -mnrs 
SunOS ragno.ewranglers.com 5.8 sun4u

ragno:/tew/src/security/openssl-0.9.6b/apps $ ./openssl 
OpenSSL> speed dsa
Doing 512 bit sign dsa's for 10s: 1246 512 bit DSA signs in 9.92s
Doing 512 bit verify dsa's for 10s: 1036 512 bit DSA verify in 9.87s
Doing 1024 bit sign dsa's for 10s: 389 1024 bit DSA signs in 10.01s
Doing 1024 bit verify dsa's for 10s: 319 1024 bit DSA verify in 10.00s
OpenSSL 0.9.6b 9 Jul 2001
built on: Wed Oct 10 00:39:37 EDT 2001
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DULTRASPARC -DMD5_ASM
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0080s   0.0095s    125.6    105.0
dsa 1024 bits   0.0257s   0.0313s     38.9     31.9

OpenSSL> speed rsa
Doing 512 bit private rsa's for 10s: 1156 512 bit private RSA's in 9.84s
Doing 512 bit public rsa's for 10s: 12377 512 bit public RSA's in 9.93s
Doing 1024 bit private rsa's for 10s: 206 1024 bit private RSA's in 9.94s
Doing 1024 bit public rsa's for 10s: 3773 1024 bit public RSA's in 9.89s
Doing 2048 bit private rsa's for 10s: 33 2048 bit private RSA's in 10.28s
Doing 2048 bit public rsa's for 10s: 1068 2048 bit public RSA's in 9.97s
Doing 4096 bit private rsa's for 10s: 5 4096 bit private RSA's in 10.78s
Doing 4096 bit public rsa's for 10s: 290 4096 bit public RSA's in 10.02s
OpenSSL 0.9.6b 9 Jul 2001
built on: Wed Oct 10 00:39:37 EDT 2001
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DULTRASPARC -DMD5_ASM
                  sign    verify    sign/s verify/s
rsa  512 bits   0.0085s   0.0008s    117.5   1246.4
rsa 1024 bits   0.0483s   0.0026s     20.7    381.5
rsa 2048 bits   0.3115s   0.0093s      3.2    107.1
rsa 4096 bits   2.1560s   0.0346s      0.5     28.9

OpenSSL> speed des
Doing des cbc for 3s on 8 size blocks: 1090210 des cbc's in 2.98s
Doing des cbc for 3s on 64 size blocks: 159278 des cbc's in 2.99s
Doing des cbc for 3s on 256 size blocks: 40913 des cbc's in 2.95s
Doing des cbc for 3s on 1024 size blocks: 9895 des cbc's in 2.83s
Doing des cbc for 3s on 8192 size blocks: 1276 des cbc's in 2.94s
Doing des ede3 for 3s on 8 size blocks: 423763 des ede3's in 2.89s
Doing des ede3 for 3s on 64 size blocks: 58237 des ede3's in 2.97s
Doing des ede3 for 3s on 256 size blocks: 14746 des ede3's in 2.95s
Doing des ede3 for 3s on 1024 size blocks: 3716 des ede3's in 3.00s
Doing des ede3 for 3s on 8192 size blocks: 459 des ede3's in 2.93s
OpenSSL 0.9.6b 9 Jul 2001
built on: Wed Oct 10 00:39:37 EDT 2001
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int) blowfish(ptr) 
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DULTRASPARC -DMD5_ASM
The 'numbers' are in 1000s of bytes per second processed.
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des cbc           2926.74k     3409.29k     3550.42k     3580.38k     3555.44k
des ede3          1173.05k     1254.94k     1279.65k     1268.39k     1283.32k



 --johan