Subject: Re: improving ssh performance on sun4m systems
To: None <port-sparc@netbsd.org>
From: Charles Shannon Hendrix <shannon@widomaker.com>
List: port-sparc
Date: 03/16/2002 00:44:09
On Fri, Mar 15, 2002 at 11:38:01PM -0000, eeh@netbsd.org wrote:

> Same load/use penalties, but this is a leaf function so there's no
> save, and a multiply instruction is used instead of a function call.
> Here's the results with "cc -mtune=supersparc -O3":
> 
> main:
>         save    %sp, -104, %sp
>         sethi   %hi(a), %o1
>         sethi   %hi(b), %o2
>         ld      [%o1+%lo(a)], %o0
>         ld      [%o2+%lo(b)], %o1
>         call    .umul, 0
>          nop
>         ret
>         restore %g0, %o0, %o0

I was afraid of this after noting a few interesting test results.

-mv8 enables the instruction set, -mtune=supersparc enables the
instruction ordering (anything else?), and -msupersparc does both.

I think cpuflags should really issue the latter, because what it
does say, -mcpu=supersparc, is only the v8 instruction set.

Does anyone know if the v8 CPUs (like supersparc, turbosparc, and
microsparc) have the scan (ffs) instruction like the sparclite?

> So what have we learned?  Well, we are getting some performance boost
> from using the multiply instruction over calling the library routine.
> But other parts of the code are poorly tuned.  Also, tuning is separate
> from the instructions being used.  So we should be able to improve performance
> without losing v7 compatibility by using "-mtune=supersparc" (or 
> "-mtune=ultrasparc" since it has more functional units and should provide
> more parallelism).

We really need to find some good way of testing the performance of the
various tuning parameters, on different machines, to see which works
out the best overall.

Most benchmarks a a bit too specific for this, so application tests
or generic (if there is such a thing) benchmarks would be better.

-- 
UNIX/Perl/C/Pizza__________________________________shannon@widomaker.com