Subject: Re: improving ssh performance on sun4m systems
To: None <port-sparc@netbsd.org>
From: Charles Shannon Hendrix <shannon@widomaker.com>
List: port-sparc
Date: 03/15/2002 14:57:03
On Fri, Mar 15, 2002 at 06:00:47PM -0000, eeh@netbsd.org wrote:
> I doubt this has much effect. Multiply step takes a maximum of 33 cycles.
> Since most of the code should already take this into account, the compiler
> would try to avoid those operations as much as possible.
It was said on the list this was because of mul/div improvements.
But I could see it being caused by instruction ordering, and changes
like add->mov.
> I think that the scheduling is much more likely to have a performance impact
> than changing multiply and/or divide.
Seems to speed things up for me. Also, there are other instructions
affected besides mul/div. For example, a lot of add instructions were
replaced by mov in a few programs I rebuilt. I don't know what other
instructions are affected yet. A comprehensive list would be nice.
It would be interesting to run a "libc benchmark" on an unmodified
machine and mine to see what all is affected.
Just for an example of when the library fix won't help, I tested
heapsort. My machine has -mv8 libraries all around.
Still, a normal optimized compile gives these results:
Runtime is the average for 1 iteration.
High MIPS = 71.08
Low MIPS = 55.40
Build with -mv8 and you get:
Runtime is the average for 1 iteration.
High MIPS = 95.79
Low MIPS = 63.18
I'm not saying this is representative of the speedup you can expect by
building your binaries with -mv8, but quite a few programs were sped up,
even when already linked against -mv8 libraries.
--
UNIX/Perl/C/Pizza__________________________________shannon@widomaker.com