tech-toolchain: Re: gcc4 misoptimization

Subject: Re: gcc4 misoptimization
To: Richard Earnshaw <rearnsha@arm.com>
From: Matthias Drochner <M.Drochner@fz-juelich.de>
List: tech-toolchain
Date: 08/01/2006 19:08:43

rearnsha@arm.com said:
> > Successice additions/ subtractions of floats are done with
> > double precision appearently.
> No, they are done with internal precision (which is
> probably extended double precision on the 387 floating point unit).

On NetBSD/i386 (and amd64), the FPU control word is initialized
for "double" rounding. So we don't have these problems as long
as the code uses double.

> change the rounding precision in the core before doing the
> operation.  Too tricky for the compiler to handle -- it can't cope with
> changing registers that change the behaviour of other registers in this
> way.

gcc issues code to change the FPU control word under some
circumstances. In particular, if a floating point value is
converted to an integer by typecast.

> Use a different floating point unit (eg the SSE instructions), but
> not all x86 cpus have that.

Yes, that would help. Unfortunately the current way of selecting
an "overlay" libm can't be extended easily for mmx/sse support.

> The safest solution is probably to compile this one file with
> -ffloat-store

I still like the "volatile" more. It produces assembler code as
good as it can be. And a -ffloat-store would be seperate from
the source file in a Makefile, where it likely will be messed
up sooner or later. (Yes, I know, a regression test might help.)
If we had a #pragma for it...

best regards
Matthias