Subject: Re: ssh hangs for a while waiting to connect
To: Aaron J. Grier <agrier@poofygoof.com>
From: Johan A. van Zanten <johan@ewranglers.com>
List: netbsd-users
Date: 03/07/2002 15:18:12
---In message <20020306174138.U29114@goldberry.poofy.goof.com>
>the problem has shown up in other ports (68k and vax). based on my
>limited testing, I posit it's an optimization problem.
>
>there's a speed benchmark program (speed.c) in
>src/crypto/dist/openssl/apps if anybody else wants to take a closer
>look, or verify my results.
Very cool! Thanks for pointing this out.
I did some quick perf testing using openssl on my Stock NetBSD 1.5.2 box
(brahma), and compared it with an Ultra-30 running Solaris 8 (ragno),
where i built everything (openssl, openssh) myself from source.
host dsa signs rsa signs des
name 512bits 1024bits 512bits 1024bits cbc-1KB ede-1KB
--------------------------------------------------------------------------
brahma 8.0 2.3 8.8 2.3 1047.21k 369.32k
ragno 125.6 38.9 117.5 20.7 3580.38k 1268.39k
Ragno X Faster than Brahma
15.70 16.91 13.35 9.00 3.40 3.43
A 170 MHz microSPARC-II is a pretty respectable CPU in my mind, even
compared with the 300MHz UltraSPARC-II in the U30. I would expect under
most circumstances* the 300 MHz US-II to be about 3-4 times faster than
the 170 MHz mS-II for integer perf. and 5-6 times faster for floating
point under conditions where software has been optimized for both
processors. In all the categories above, *except* des, there's a larger
discrepancy (especially large in dsa), so i think you're right: it's a
compile time optimization issue with OpenSSL.
The next step for me is to compile, install and re-test OpenSSL on
NetBSD. Unforunately, OpenSSL wants Perl 5, which makes this a little more
time consuming. (Because there's no perl in the default NetBSD 1.5.2
installation)
*The mS-II has 512KB L2 cache, whereas this US-II has 2 MB, so there could
be some very big perf differences in tests that were heavily affect by the
difference in cache size.
----------------------------------------------------------------------
Detailed results:
(This is a SPARC-5/170)
brahma:~ $ uname -mnrs
NetBSD brahma 1.5.2 sparc
brahma:~ $ openssl
OpenSSL> speed dsa
Doing 512 bit sign dsa's for 10s: 80 512 bit DSA signs in 10.50s
Doing 512 bit verify dsa's for 10s: 63 512 bit DSA verify in 9.97s
Doing 1024 bit sign dsa's for 10s: 23 1024 bit DSA signs in 10.70s
Doing 1024 bit verify dsa's for 10s: 19 1024 bit DSA verify in 10.39s
OpenSSL 0.9.5a 1 Apr 2000
built on: Sun Aug 19 19:35:28 CST 2001
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -O2 -Werror
sign verify sign/s verify/s
dsa 512 bits 0.1256s 0.1583s 8.0 6.3
dsa 1024 bits 0.4378s 0.5468s 2.3 1.8
OpenSSL> speed rsa
Doing 512 bit private rsa's for 10s: 88 512 bit private RSA's in 9.98s
Doing 512 bit public rsa's for 10s: 759 512 bit public RSA's in 10.00s
Doing 1024 bit private rsa's for 10s: 14 1024 bit private RSA's in 10.41s
Doing 1024 bit public rsa's for 10s: 216 1024 bit public RSA's in 10.20s
Doing 2048 bit private rsa's for 10s: 2 2048 bit private RSA's in 10.42s
Doing 2048 bit public rsa's for 10s: 59 2048 bit public RSA's in 10.90s
Doing 4096 bit private rsa's for 10s: 1 4096 bit private RSA's in 37.97s
Doing 4096 bit public rsa's for 10s: 16 4096 bit public RSA's in 10.28s
OpenSSL 0.9.5a 1 Apr 2000
built on: Sun Aug 19 19:35:28 CST 2001
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -O2 -Werror
sign verify sign/s verify/s
rsa 512 bits 0.1134s 0.0132s 8.8 75.9
rsa 1024 bits 0.7436s 0.0464s 1.3 21.6
rsa 2048 bits 5.2100s 0.1710s 0.2 5.8
rsa 4096 bits 37.9700s 0.6425s 0.0 1.6
OpenSSL> speed des
Doing des cbc for 3s on 8 size blocks: 363066 des cbc's in 3.10s
Doing des cbc for 3s on 64 size blocks: 48360 des cbc's in 3.10s
Doing des cbc for 3s on 256 size blocks: 12182 des cbc's in 2.98s
Doing des cbc for 3s on 1024 size blocks: 3068 des cbc's in 3.00s
Doing des cbc for 3s on 8192 size blocks: 384 des cbc's in 3.10s
Doing des ede3 for 3s on 8 size blocks: 134258 des ede3's in 2.99s
Doing des ede3 for 3s on 64 size blocks: 17194 des ede3's in 3.10s
Doing des ede3 for 3s on 256 size blocks: 4327 des ede3's in 3.10s
Doing des ede3 for 3s on 1024 size blocks: 1082 des ede3's in 3.00s
Doing des ede3 for 3s on 8192 size blocks: 136 des ede3's in 3.10s
OpenSSL 0.9.5a 1 Apr 2000
built on: Sun Aug 19 19:35:28 CST 2001
options:bn(32,32) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -O2 -Werror
The 'numbers' are in 1000s of bytes per second processed.
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des cbc 964.96k 1028.25k 1046.51k 1047.21k 1045.09k
des ede3 359.22k 365.59k 368.01k 369.32k 370.14k
OpenSSL>
----------------------------------------------------------------------
This is a Ultra-30 (300 MHz):
$ uname -mnrs
SunOS ragno.ewranglers.com 5.8 sun4u
ragno:/tew/src/security/openssl-0.9.6b/apps $ ./openssl
OpenSSL> speed dsa
Doing 512 bit sign dsa's for 10s: 1246 512 bit DSA signs in 9.92s
Doing 512 bit verify dsa's for 10s: 1036 512 bit DSA verify in 9.87s
Doing 1024 bit sign dsa's for 10s: 389 1024 bit DSA signs in 10.01s
Doing 1024 bit verify dsa's for 10s: 319 1024 bit DSA verify in 10.00s
OpenSSL 0.9.6b 9 Jul 2001
built on: Wed Oct 10 00:39:37 EDT 2001
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int) blowfish(ptr)
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DULTRASPARC -DMD5_ASM
sign verify sign/s verify/s
dsa 512 bits 0.0080s 0.0095s 125.6 105.0
dsa 1024 bits 0.0257s 0.0313s 38.9 31.9
OpenSSL> speed rsa
Doing 512 bit private rsa's for 10s: 1156 512 bit private RSA's in 9.84s
Doing 512 bit public rsa's for 10s: 12377 512 bit public RSA's in 9.93s
Doing 1024 bit private rsa's for 10s: 206 1024 bit private RSA's in 9.94s
Doing 1024 bit public rsa's for 10s: 3773 1024 bit public RSA's in 9.89s
Doing 2048 bit private rsa's for 10s: 33 2048 bit private RSA's in 10.28s
Doing 2048 bit public rsa's for 10s: 1068 2048 bit public RSA's in 9.97s
Doing 4096 bit private rsa's for 10s: 5 4096 bit private RSA's in 10.78s
Doing 4096 bit public rsa's for 10s: 290 4096 bit public RSA's in 10.02s
OpenSSL 0.9.6b 9 Jul 2001
built on: Wed Oct 10 00:39:37 EDT 2001
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int) blowfish(ptr)
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DULTRASPARC -DMD5_ASM
sign verify sign/s verify/s
rsa 512 bits 0.0085s 0.0008s 117.5 1246.4
rsa 1024 bits 0.0483s 0.0026s 20.7 381.5
rsa 2048 bits 0.3115s 0.0093s 3.2 107.1
rsa 4096 bits 2.1560s 0.0346s 0.5 28.9
OpenSSL> speed des
Doing des cbc for 3s on 8 size blocks: 1090210 des cbc's in 2.98s
Doing des cbc for 3s on 64 size blocks: 159278 des cbc's in 2.99s
Doing des cbc for 3s on 256 size blocks: 40913 des cbc's in 2.95s
Doing des cbc for 3s on 1024 size blocks: 9895 des cbc's in 2.83s
Doing des cbc for 3s on 8192 size blocks: 1276 des cbc's in 2.94s
Doing des ede3 for 3s on 8 size blocks: 423763 des ede3's in 2.89s
Doing des ede3 for 3s on 64 size blocks: 58237 des ede3's in 2.97s
Doing des ede3 for 3s on 256 size blocks: 14746 des ede3's in 2.95s
Doing des ede3 for 3s on 1024 size blocks: 3716 des ede3's in 3.00s
Doing des ede3 for 3s on 8192 size blocks: 459 des ede3's in 2.93s
OpenSSL 0.9.6b 9 Jul 2001
built on: Wed Oct 10 00:39:37 EDT 2001
options:bn(64,32) md2(int) rc4(ptr,char) des(idx,cisc,16,long) idea(int) blowfish(ptr)
compiler: gcc -fPIC -DTHREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -mcpu=ultrasparc -O3 -fomit-frame-pointer -Wall -DB_ENDIAN -DBN_DIV2W -DULTRASPARC -DMD5_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
des cbc 2926.74k 3409.29k 3550.42k 3580.38k 3555.44k
des ede3 1173.05k 1254.94k 1279.65k 1268.39k 1283.32k
--johan