Subject: Re: improving ssh performance on sun4m systems
To: None <port-sparc@netbsd.org>
From: Thilo Manske <Thilo.Manske@HEH.Uni-Oldenburg.DE>
List: port-sparc
Date: 03/14/2002 11:14:12
On Wed, Mar 13 2002 at 18:25:43 -0800, Aaron J. Grier wrote:
> pkgsrc/devel/cpuflags can help.  on my 110MHz sparc 5 running 1.5.2,
> cpuflags adds '-mcpu=supersparc' which almost doubles my dsa
> performance:
[...]
> so gcc is obviously producing much better code on this machine when
> -mcpu=supersparc is used.  and this is with the older compiler...
BTW: You get sightly better results using -mv8 than -msupersparc on s SS5 (I
think -msupersparc doesn't only use the v8 instcructions, it tunes the code
for superscalar execution for supersparc cpus as well).

> gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release)
> 
> it wouldn't surprise me if the newer gcc is even better about optimizing
> and / or tuning.

Well, if you mean 2.95, then I must say in the not-tuned case it's often
worse and I think in your case (µSparc II) even with tuned code.

I did some tests early this year since I noticed some performace loss after
upgrading my compilers to the new toolchain (that's why I don't have results
for the old compiler in the tuned cases...).  FWIW here are the not that
scientific results using dhrystone as benchmark:

dhrystone                    libc (compiled with standard netbsd flags)

SPARCStation IPC (25MHz MB86901):
compiler flags               version compiler dhrystones/s
2.91     -O2                 12.61   2.92      24950 < default² (before)
2.91     -O2                 12.81   2.95      20080
2.95     -O2                 12.81   2.95      20180 < default (after)
2.95     -O2 -mypress(*)     12.81   2.95      20260
=> Which makes a performance decrease of about 20% using the default flags :-(

SPARCStation IPX (40MHz MB86903):
2.91     -O2                 12.61   2.92      42350 < default² (before)
2.91     -O2                 12.81   2.95      30070
2.95     -O2                 12.81   2.95      31880 < default (after)
2.95     -O2 -mypress(*)     12.81   2.95      32000
=> ~-25% in the default case

And now for some Sun4m systems:
SPARCStation Classic [X] (50MHz µSparc):
2.91     -O2                 12.61    2.92        46080 < default² (before)
2.91     -O2                 12.81    2.95        45170
2.95     -O2                 12.81    2.95        51280 < default (after)
2.95     -O2 -mv8            12.81    2.95        58280
2.95     -O2 -msupersparc(*) 12.81    2.95        58140
=> ~+10%

SPARCStation 4/85 (85Hz µSPARC II):
2.91     -O2                 12.61   2.92        95240 < default³ (before)
2.91     -O2                 12.61   2.92	     90740 < default² (before)
2.91     -O2                 12.81   2.95        82920
2.95     -O2                 12.81   2.95        87560 < default (after)
2.95     -O2 -mv8            12.81   2.95       113400
2.95     -O2 -msupersparc(*) 12.81   2.95       110000
=> ~-10%

And on a SPARCStation 20/71 (75MHz Supersparc II):
2.91     -O2                 12.61    2.92       112600 < default² (before)
2.91     -O2                 12.81    2.95       119800
2.95     -O2                 12.81    2.95       118700 < default (after)
2.95     -O2 -mv8            12.81    2.95       127700
2.95     -O2 -msupersparc(*) 12.81    2.95       127400
=> ~+5%

Remarks:
- "before"/"after" is before/after I switched to the old toolchain
- The old dhrystone binary I found on my system was compiled April 17 2000,
  the libc.12.61 May 2000
- tests were repeated 3 times with 1E6 runs, the average was taken and rounded
- all Kernels for Sun4m systems were compiled with "-mv8"

*) this was the suggested optimization of pkg devel/cpuflags for that machine
²) but with new kernel (gcc 2.95 compiled, NetBSD 1.5ZA), results may be
   better with older kernel of matching date (see SPARCStation 4 results) but
   don't ask me why... (I think The dhrystone benchmark doesn't do much syscalls)
³) with old kernel (gcc 2.92 compiled, NetBSD 1.5W from June)

BTW: On MIPS the situation is similar, I hope GCC3 generated code will be
back to the 2.92 quality (in terms of execution speed).

-- 
Dies ist Thilos Unix Signature! Viel Spass damit.