Subject: Re: openssl (or gcc) performance changes?
To: NetBSD current list <current-users@netbsd.org>
From: William Allen Simpson <wsimpson@greendragon.com>
List: current-users
Date: 10/02/2003 12:37:34
William Allen Simpson wrote:
> 
> "Perry E. Metzger" wrote:
> > I would suggest that Bill might want to poke around a bit and figure
> > out the exact source of the slowdown.
> >
> The programs involved were posted to PR 21983, so anybody can test
> and/or profile them.
> 
> I only have 1 -current system, but I have another box with nearly
> identical configuration.  When I have a bit of time, I'll try loading
> an old releng from around the July timeframe on it, and see whether
> anything is obvious.
>
OK, here are some of the obvious things:

 1) GCC 3.3.1 appears to be faster than 2.95.3 in this application. 

    Running just qsieve on the same size BN is consistently about 
    0.1% faster (about 6 seconds in 5665, where runs of 2.95.3 are 
    always within 1 second of each other).

    Running just qsafe on the exact same BN is consistently about 
    1.9% faster (about 1700 seconds in 92000).

 2) Running either 3.3.1 or 2.95.3 with 1.6ZC (openssl 0.9.7b) is 
    always much slower than 1.6U (openssl 0.9.6b for crypto/bn_prime.c, 
    0.9.6e for crypto/bn_mont.c, that seem to be the major activity) 
    when running qsafe (primality tester using BN_is_prime).  And by 
    "much", I mean 30% to 300%, depending on the size of the prime 
    moduli being tested.

 3) Yet, I have not found serious differences in this code.  There are 
    a lot of tiny changes, but they all appear (to me) to be minor 
    cleanup.  My eyes glazed over.

 4) However, just this morning, I noticed that the load average is not 
    what I'd expected. 

    This run started at "Tue Sep 30 02:53:46"

    date && ps u && uptime
    Thu Oct  2 12:37:50 EDT 2003
    USER     PID %CPU %MEM VSZ  RSS TT STAT STARTED       TIME COMMAND
    current 4848 99.0  1.9 132 1268 p1 RN   Tue02AM 3457:08.20 ./qsafe 64
    ...
    12:37PM  up 5 days,  3:16, 3 users, load averages: 1.07, 1.08, 1.08

    The 3457 minutes (57.62 hours) seems to be fairly close to what I'd 
    expect by the clock (57.7 hours).

    Does this really mean that although the %CPU is at 99, the load 
    average is miniscule 1%?

    Has something in scheduling changed from 1.6U to 1.6ZC?

I really could use some ideas!  As you can see, the tests take days, 
so a pointer at a better technique might be helpful!
-- 
William Allen Simpson
    Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32