NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
The following reply was made to PR port-mips/57680; it has been noted by GNATS.
From: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
To: riastradh%NetBSD.org@localhost
Cc: gnats-bugs%netbsd.org@localhost, tsutsui%ceres.dti.ne.jp@localhost
Subject: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Date: Wed, 15 Nov 2023 02:02:30 +0900
riastradh@ wrote:
> > I wonder how many nops are actually required for the FP coprocessor
> > (i.e. from/to a different chip) registers..
>
> According to the manual I found at
> <https://cgi.cse.unsw.edu.au/~cs3231/doc/R3000.pdf#page=103>, only one
> delay instruction is needed (bottom of p. 8-8):
>
> The load operation has a delay of one clock, and (like loading to
> an integer register) this is not interlocked. The compiler and/or
> assembler will usually take care of this; but it is invalid for an
> FP load to be immediately followed by an instruction using the
> loaded value.
>
> (I'm not saying this is absolutely true and applicable here -- just
> sharing it as the only documentation I could find that looks like it
> might be applicable.)
I've checked "MIPS RISC architecture" by Kane Gerry (Japanese edition)
https://www.amazon.co.jp/dp/4320025989
and it says more scheduling is necessary right after LWC1, MTC1, and CTC1,
but I don't understand details.. (and a bit hard to translate to English)
> > The following diff against dtoa.c reduces outputs a bit
> > (requires "DBG="-O2 -Wno-error=uninitialized -Wno-error=unused-variable
> > -Wno-error=unused-but-set-variable -Wno-error=maybe-uninitialized
> > -Wno-error=unused-label"):
>
> Just to be clear: Do you mean that (a) the diff to the .s files shows
> the difference between non-working printf (without the nops) and
> working printf (with the nops), and (b) there's nothing else different
> between non-working vs working printf? Or did I misunderstand?
In my previous mail,
- dtoa-small-inline-rfs.s is 'objdump -dz' output from dtoa.pico
built with two #if 0/#endif pairs and the original <mips/fenv.h>
- dtoa-small-noinline-rfs.s is 'objdump -dz' output from dtoa.pico
built with two #if 0/endif pairs and patched <mips/fenv.h>
that added "static __noinline" to __rfs()
I've put full asm sources (i.e. no #if 0/#endif pair) by
objdump -dz (address numbers are manually deleted):
https://gist.github.com/tsutsui/85b03f26aa1bfd3fdd884bce8fd8c1e7
- dtoa-inline-rfs.s
dtoa.pico built from the original libc source, i.e. non-working printf
- dtoa-noinline-rfs.s
dtoa.pico built with __noinline __rfs in <mips/fenv.h>, working printf
- dtoa-inline-noinline-rfs.diff
diff between the above two .s files
- fenv.h.diff
__noinline __rfs() diff used on building dtoa-noinline-rfs.s
I'd say the answers of your two questions both (a) and (b) are "yes".
> Wasn't there a difference about inline vs non-inline __rfs, which
> should presumably affect where the cfc1 instruction is?
It looks the differences of nops after lwc1 are not relevant to
cfc1 used in __rfs().
---
Izumi Tsutsui
Home |
Main Index |
Thread Index |
Old Index