tech-toolchain: Re: gcc4 misoptimization

Subject: Re: gcc4 misoptimization
To: None <M.Drochner@fz-juelich.de>
From: Richard Earnshaw <rearnsha@arm.com>
List: tech-toolchain
Date: 07/28/2006 10:55:31

On Thu, 2006-07-27 at 20:10, Matthias Drochner wrote:
> Hi -
> just found that lrintf() in libm doesn't work well
> if compiled with gcc4 on i386. Successice additions/
> subtractions of floats are done with double precision
> appearently. 
No, they are done with internal precision (which is probably extended
double precision on the 387 floating point unit).

> This shouldn't be done because with single
> precision floats a loss of precision can happen. (Which
> is deliberately used here to accomplish rounding.)
> See the disassembly:
>   36:   83 f8 16                cmp    $0x16,%eax
>   39:   7f 17                   jg     52 <lrintf+0x52>
>   3b:   d9 04 95 00 00 00 00    flds   0x0(,%edx,4)
>   42:   89 4d f8                mov    %ecx,0xfffffff8(%ebp)
>   45:   d9 45 f8                flds   0xfffffff8(%ebp)
>   48:   d8 c1                   fadd   %st(1),%st
>   4a:   de e1                   fsubp  %st,%st(1)
>   4c:   d9 5d f8                fstps  0xfffffff8(%ebp)
> 
> Compiling with -O0 helps, as does the appended patch.
> It does not happen with lrint(double), and it also
> does not happen on alpha.
> 
> Is this a known gcc4 bug? Anyone has a better idea how
> to work around this?

It's a long standing problem on the X86.  The *only* ways to force it to
round correctly are to use one of:
1) Store every user FP variable out to memory (gcc has -ffloat-store to
force this, but it can be very expensive when you don't need it)
2) change the rounding precision in the core before doing the
operation.  Too tricky for the compiler to handle -- it can't cope with
changing registers that change the behaviour of other registers in this
way.
3) Use a different floating point unit (eg the SSE instructions), but
not all x86 cpus have that.

R.