tech-userlevel archive

Re: long double losing mantissa bits

```> Date: Tue, 19 Jan 2021 22:10:56 +0100
> From: Rhialto <rhialto%falu.nl@localhost>
>
> The loss of precision probably occurs at the line
>
>     // The following big literal is 2 to the 56th power:
>     ufrac = (uint64_t) (frac * 72057594037927936.0);
>
> where frac is in the range [ 0.5 , 1.0 >, so I would expect that
> multiplying with 2**56 is perfectly feasable (just changes the
> exponent). Debugging output shows that differences in the lsbits that
> were detectable before (when printed with %La), were no longer after.

In the following test case:

input: 72057594037927937 -> scanf: 72057594037927937.000000
frac: 0.500000 0x8.00000000000008p-4 sexp: 57
uexp: b9
ufrac: 0080000000000000
56   : 00ffffffffffffff
Unexpected result: 5c80 0000 0000 0000
expected       : 5c80 0000 0000 0001
056200: sign:  0 uexp:  b9 ufrac: 00 0000 0000 0000

Here frac = 0x8.00000000000008p-4 = (1 + 2^56)/2^57, so
frac*72057594037927936 = frac * 2^56 = 0x8.00000000000008p52 =
0x80000000000000.8p52.

This is not an integer, so conversion to uint64_t rounds it to
nearest, with ties to even, so you get ufrac = 0x80000000000000 =
36028797018963968 as shown in the result.

It looks like you have an off-by-one error in your exponent handling.
If you want to scale the fractional part into an integer, you need to
multiply by 2^57, not by 2^56.
```

Home | Main Index | Thread Index | Old Index