Subject: Re: Preliminary test of i386 kernel compiling with GCC 4.0
To: Simon Burge <simonb@wasabisystems.com>
From: Vincent <10.50@free.fr>
List: tech-kern
Date: 10/08/2004 13:39:44
Hi,

> Interesting that you saw 4.0 being slower that 3.3.3.  Here's some
> numbers for gcc floating-point performance, using the time for glucas (a
> FFT-based prime number testing program) self tests in order from fastest
> to slowest:
> 
> 	icc 8.0:       1511.657u 0.309s 25:26.36
> 	gcc 3.4:       1779.724u 0.239s 29:57.07
> 	gcc 4.0exp:    1845.911u 0.249s 31:03.67
> 	gcc 3.3.3:     2192.964u 0.249s 36:54.81
> 	gcc 3.5exp:    2259.040u 0.239s 38:01.40

If I understand your figures correctly, 4.0 is *at that time* less 
efficient than 3.4.2, despite the new optimizer.
Well, I suppose that the optimizer has little to do with FPU-related 
code, whereas ICC has already a good scheduler for it, and I suspect ICC 
to use both the 387 and SSE unit in parallel, something I daren't try 
yet on GCC. And then, ICC, if I remember, does multiple file 
optimization, something GCC does not.

Also, I suspect bladeenc to lose time in the interfaces with the kernel. 
I faintly fancied that the time lost was due to spurious waits during 
disk accesses, but I can't be sure about that. I see no other reason why 
the same program would become twice slower.

That's why I wanted to compile the kernel, being _IMHO_ the best integer 
based code available. But the kernel won't run, even if I use NOGCCERROR 
and use gcc 3.4.2 where the 4.0.0 fails. As I said, it obviously fails 
in the driver attach code or something.