port-arm32: Re: shocking speed performance!

Subject: Re: shocking speed performance!
To: None <kim@pvv.ntnu.no>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm32
Date: 05/20/1999 10:29:20
> > > You may do what I did:
> > > Compile to assembler source code, and then examine it.
> > > If you find stuff like: 
> > > a <- b * c
> > > a <- a + d
> > > 
> > > Then you have bad code.
> > 
> > Yes you have, the compiler would use mla!
> 
> Ah, but does it? It did definitely not use that in the example I supplied
> some months ago.
> 

Can you be more precise?  Some months ago isn't going to help me find a 
message in the archives.

> > > Fast code should look something like:
> > > a <- b * c
> > > e <- Array[x]
> > > a <- a + d
> > 
> > Bollocks.  you should get
> > 
> >   e = Array[x]
> >   a = b * c + d
> > 
> > There's no interleaving in this case.
> 
> That is also a good way of doing it. If you want something more than my
> current sloppy examples, just reread my very thorough post from a few
> months back. It has real assempler code supplied. It was that which made
> me finally buy a Intel PC.

I do still have a posting from you, from back in December.  If I recall, 
it turned out that you were still using the old gcc 2.7 derived compiler 
on netbsd-1.3.x, which had no support for strongarms (hardly surprising, 
since that compiler version came out before the SA110 was released).

> > As for fixing it, I've never seen any patches from you, only complaints.  
> > If you think your so good, why don't you get down off your high horse and 
> > contribute, then we might all benefit from your obvious wisdom.
> 
> A couple of years ago, I used 3 months of evenings working on a very
> fast floating point "library" for ARM32. I had the code, and it was fast.
> The Idea was to implement the often used operators like +-/* as
> macroes in GCC, and inline functions in Acorn C++. In Acorn C++ I got
> very far, but was stopped some bug in it, probably too many overloadings
> of standard functions. A friend of mine had the same problem with his
> OpenGL implementation for RiscOS. As for GCC, I never got a working source,
> and the advice I got did not work. I have wasted a lot of time on this.
> 

Fast floating point would be good, but only if it is IEEE compliant.  
Anything else is going to cause problems for a lot of others.  Don't 
forget that understanding of the IEEE data format is embedded deeply 
within many programs and libraries (starting with, but not limited to 
libc, gcc, gas, gdb ...)

> I could also have contributed good sound systems. After all, it was I 
> who wrote HiFi-Arch, the program tweaking NICAM quality sound from
> original Archimedeses.
> 
> I have just burnt too much time on this ARM32 project.

That seems to be the way of open-source type projects.  I hate to think 
how much of my own time (and money) I've spent on projects of this type.

> 
> > As for strongarm tuning.  A lot of tweeks have been put into the upcoming 
> > egcs-1.2 releases, but you have to remember to tell the compiler what type 
> > of cpu you have or it will try to generate a generic tune that isn't too 
> > bad for any CPU.
> 
> That sounds very good. However, a lot of things have sounded very good,
> without ever becoming real. I hope you are right this time.

It will only be as real as the code that gets donated by the volunteers; 
very few people are paid to do this.

> 
> Why is not environment variables or standard configuration files used
> to make gcc/egcs use that as standard directives, like they do in other
> systems?

The official egcs distribution supports specifying the cpu type during 
configure, you then get a compiler that will target that cpu by default.  
But don't forget that any code that you then build with it may not port to 
other machines with older processors.  The default configuration is to go 
for portability.

I've seen installations where the compiler looks at the machine its 
running on and then generates code for that type of machine by default.  
They suck.  It causes no end of support issues when you ship a binary to a 
customer with a different configuration.