Subject: Re: improving ssh performance on sun4m systems
To: None <port-sparc@netbsd.org>
From: Charles Shannon Hendrix <shannon@widomaker.com>
List: port-sparc
Date: 03/15/2002 14:57:03
On Fri, Mar 15, 2002 at 06:00:47PM -0000, eeh@netbsd.org wrote:

> I doubt this has much effect.  Multiply step takes a maximum of 33 cycles.
> Since most of the code should already take this into account, the compiler
> would try to avoid those operations as much as possible.  

It was said on the list this was because of mul/div improvements.
But I could see it being caused by instruction ordering, and changes
like add->mov.

> I think that the scheduling is much more likely to have a performance impact
> than changing multiply and/or divide.

Seems to speed things up for me.  Also, there are other instructions
affected besides mul/div.  For example, a lot of add instructions were
replaced by mov in a few programs I rebuilt.  I don't know what other
instructions are affected yet.  A comprehensive list would be nice.

It would be interesting to run a "libc benchmark" on an unmodified
machine and mine to see what all is affected.

Just for an example of when the library fix won't help, I tested
heapsort.  My machine has -mv8 libraries all around.

Still, a normal optimized compile gives these results:

   Runtime is the average for 1 iteration.
   High MIPS =    71.08
   Low  MIPS =    55.40

Build with -mv8 and you get:

   Runtime is the average for 1 iteration.
   High MIPS =    95.79
   Low  MIPS =    63.18

I'm not saying this is representative of the speedup you can expect by
building your binaries with -mv8, but quite a few programs were sped up,
even when already linked against -mv8 libraries.



-- 
UNIX/Perl/C/Pizza__________________________________shannon@widomaker.com