NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
The following reply was made to PR port-mips/57680; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
Cc: gnats-bugs%netbsd.org@localhost, tsutsui%ceres.dti.ne.jp@localhost
Subject: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Date: Tue, 14 Nov 2023 21:48:20 +0000
> Date: Wed, 15 Nov 2023 02:02:30 +0900
> From: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
>
> I've checked "MIPS RISC architecture" by Kane Gerry (Japanese edition)
> https://www.amazon.co.jp/dp/4320025989
> and it says more scheduling is necessary right after LWC1, MTC1, and CTC1,
> but I don't understand details.. (and a bit hard to translate to English)
I found a copy of the English book in the Internet Archive:
https://archive.org/details/mips-risc-architecture-2nd-ed/page/n231/mode/2up
In Table 8-6 `Floating-Point Operation Latencies', for R2010 and R3010
(though not R3000 -- not listed) it says:
LWC1 2(a)
...
CTC1 2(a)
CFC1 2
The footnote reads:
(a) Software /must/ schedule operations to avoid reading the
floating-point register that is the target of a floating-point
load or move to floating-point unit instruction less than two
instructions later, and must schedule a floating-point branch
instruction two ore more instructions after a floating-point
compare instruction.
I read this to mean that a load into a float register must be
separated by a single other instruction (like a nop) from any use of
that float register, which would therefore be no less than two
instructions later.
But this phrasing is not very clear. It _could_ mean that there must
be two instructions separating the load and the use.
> > Wasn't there a difference about inline vs non-inline __rfs, which
> > should presumably affect where the cfc1 instruction is?
>
> It looks the differences of nops after lwc1 are not relevant to
> cfc1 used in __rfs().
It sounds like there are two separate parts to the differences between
generated code in the working and non-working libc:
(a) inline __rfs including cfc1, vs out-of-line call to __rfs, and
(b) nops in lwc1 delay slots.
Both changes are _triggered_ by putting `inline' vs `__noinline' on
the definition in the source code, but I'm talking about the
differences in the generated code, not the differences in the source
code.
If you take the _non-working_ intermediate .s file with inline __rfs
in the source code, and insert nops where the _working_ one has nops
after lwc1, and then assemble and link it all, does that result work?
Something else to try: assert that fegetround() returned FE_TONEAREST.
Nothing is linked against libm in your test cases, so nothing should
be changing the rounding mode, right? So it should always return
FE_TONEAREST. If the assertion fails, that will suggest the machine
state is set up in correctly or we're misusing cfc1 somehow; if the
assertion passes, perhaps the __rfs/cfc1/fegetround business is a red
herring, and it's actually a problem with some other part of the code
(or with the compiler's code generation).
Home |
Main Index |
Thread Index |
Old Index