Subject: Re: mindless boredom, speed and compiling kernels
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Simon Burge <simonb@telstra.com.au>
List: port-pmax
Date: 05/27/1998 11:09:39
On Mon, 25 May 1998 15:23:32 -0700  Jonathan Stone wrote:

> 
> >  I'd guess it's more likely because NetBSD has all the ABI overhead,
> >which Ultrix probably doesn't have.  That makes the programs bigger and
> >slower.
> 
> Yup. I think we (I?) estimated a 10% slowdown when we decided to go to
> ELF and to use the standard ABIcalls for "statically linked" binaries.
> 
> 
> Mika's numbers and observations from about 3 weeks ago were pretty
> convincing.  That specific problem is, basically, that the ABI-call
> machinery (.cpsave/.cprestore) is handled in the assembler as
> pseudo-ops, and the assembler doesn't seem to do any scheduling at all
> on the loads and stores those pseudo-ops generate. So it emits nulls
> around them, slowing down calls.
> 
> I thought about trying to emit the $t9 manipulations directly from GCC
> as RTL and having GCC schedule them.  But that'd run foul of the
> PIC-must-use-abicall warnings which're built into gas. Getting gas to
> schedule the .cpsave/.cprestore loads better might be best.  Anyone
> care to look at how a IRIX toolchain schedules .cpsave/cprestore on
> Mika's example?
> 
> (I dont want to start supporting non-abicall libraries; that'd mean a
> brand-new library suffix, gcc/specfile changes, etc. Not nice.  But if
> Simon wanted to try it, that'd quantify the cost pretty precisely.)

I rebuilt the crt0.o, libgcc.a, libc.a and libm.a with -mno-abicalls,
and came up with the following:

	DECstation 5000/260 NetBSD 1.3.1 - NetBSD binary
	    gcc 2.7.2.2+myc1 -O2 -mno-abicalls:
		generated 597014 moves per second
		generated 598802 moves per second
		generated 598802 moves per second

The total test involved approx. 67 million functions calls (from
prof(1)) over 90 seconds.  At least it's faster than a native Ultrix
binary, but still a tad slower than an Ultrix binary running under
NetBSD.  For reference, here's the original figures:

	DECstation 5000/260 NetBSD 1.3.1 - NetBSD binary
	    gcc 2.7.2.2+myc1 -O2:
		generated 480769 moves per second
		generated 479616 moves per second
		generated 480769 moves per second

	DECstation 5000/260 NetBSD 1.3.1 - Ultrix binary
	    gcc 2.7.2.2 -O2:
		generated 614088 moves per second
		generated 614576 moves per second
		generated 615244 moves per second

	DECstation 5000/260 Ultrix 4.5 - Ultrix binary
	    gcc 2.7.2.2 -O2:
		generated 587867 moves per second
		generated 589220 moves per second
		generated 587868 moves per second


Simon.