Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb

To: port-mips-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,tsutsui%ceres.dti.ne.jp@localhost
Subject: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Date: Tue, 14 Nov 2023 21:50:02 +0000 (UTC)

The following reply was made to PR port-mips/57680; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
Cc: gnats-bugs%netbsd.org@localhost, tsutsui%ceres.dti.ne.jp@localhost
Subject: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Date: Tue, 14 Nov 2023 21:48:20 +0000

 > Date: Wed, 15 Nov 2023 02:02:30 +0900
 > From: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
 > 
 > I've checked "MIPS RISC architecture" by Kane Gerry (Japanese edition)
 >  https://www.amazon.co.jp/dp/4320025989
 > and it says more scheduling is necessary right after LWC1, MTC1, and CTC1,
 > but I don't understand details.. (and a bit hard to translate to English)

 I found a copy of the English book in the Internet Archive:

 https://archive.org/details/mips-risc-architecture-2nd-ed/page/n231/mode/2up

 In Table 8-6 `Floating-Point Operation Latencies', for R2010 and R3010
 (though not R3000 -- not listed) it says:

    LWC1 2(a)
    ...
    CTC1 2(a)
    CFC1 2

 The footnote reads:

    (a) Software /must/ schedule operations to avoid reading the
        floating-point register that is the target of a floating-point
        load or move to floating-point unit instruction less than two
        instructions later, and must schedule a floating-point branch
        instruction two ore more instructions after a floating-point
        compare instruction.

 I read this to mean that a load into a float register must be
 separated by a single other instruction (like a nop) from any use of
 that float register, which would therefore be no less than two
 instructions later.

 But this phrasing is not very clear.  It _could_ mean that there must
 be two instructions separating the load and the use.

 > > Wasn't there a difference about inline vs non-inline __rfs, which
 > > should presumably affect where the cfc1 instruction is?
 > 
 > It looks the differences of nops after lwc1 are not relevant to
 > cfc1 used in __rfs().

 It sounds like there are two separate parts to the differences between
 generated code in the working and non-working libc:

 (a) inline __rfs including cfc1, vs out-of-line call to __rfs, and
 (b) nops in lwc1 delay slots.

 Both changes are _triggered_ by putting `inline' vs `__noinline' on
 the definition in the source code, but I'm talking about the
 differences in the generated code, not the differences in the source
 code.

 If you take the _non-working_ intermediate .s file with inline __rfs
 in the source code, and insert nops where the _working_ one has nops
 after lwc1, and then assemble and link it all, does that result work?

 Something else to try: assert that fegetround() returned FE_TONEAREST.
 Nothing is linked against libm in your test cases, so nothing should
 be changing the rounding mode, right?  So it should always return
 FE_TONEAREST.  If the assertion fails, that will suggest the machine
 state is set up in correctly or we're misusing cfc1 somehow; if the
 assertion passes, perhaps the __rfs/cfc1/fegetround business is a red
 herring, and it's actually a problem with some other part of the code
 (or with the compiler's code generation).

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Previous by Thread: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Next by Thread: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Indexes:

Home | Main Index | Thread Index | Old Index