NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb



The following reply was made to PR port-mips/57680; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
Cc: gnats-bugs%netbsd.org@localhost, tsutsui%ceres.dti.ne.jp@localhost
Subject: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Date: Tue, 14 Nov 2023 21:48:20 +0000

 > Date: Wed, 15 Nov 2023 02:02:30 +0900
 > From: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
 > 
 > I've checked "MIPS RISC architecture" by Kane Gerry (Japanese edition)
 >  https://www.amazon.co.jp/dp/4320025989
 > and it says more scheduling is necessary right after LWC1, MTC1, and CTC1,
 > but I don't understand details.. (and a bit hard to translate to English)
 
 I found a copy of the English book in the Internet Archive:
 
 https://archive.org/details/mips-risc-architecture-2nd-ed/page/n231/mode/2up
 
 In Table 8-6 `Floating-Point Operation Latencies', for R2010 and R3010
 (though not R3000 -- not listed) it says:
 
    LWC1 2(a)
    ...
    CTC1 2(a)
    CFC1 2
 
 The footnote reads:
 
    (a) Software /must/ schedule operations to avoid reading the
        floating-point register that is the target of a floating-point
        load or move to floating-point unit instruction less than two
        instructions later, and must schedule a floating-point branch
        instruction two ore more instructions after a floating-point
        compare instruction.
 
 I read this to mean that a load into a float register must be
 separated by a single other instruction (like a nop) from any use of
 that float register, which would therefore be no less than two
 instructions later.
 
 But this phrasing is not very clear.  It _could_ mean that there must
 be two instructions separating the load and the use.
 
 > > Wasn't there a difference about inline vs non-inline __rfs, which
 > > should presumably affect where the cfc1 instruction is?
 > 
 > It looks the differences of nops after lwc1 are not relevant to
 > cfc1 used in __rfs().
 
 It sounds like there are two separate parts to the differences between
 generated code in the working and non-working libc:
 
 (a) inline __rfs including cfc1, vs out-of-line call to __rfs, and
 (b) nops in lwc1 delay slots.
 
 Both changes are _triggered_ by putting `inline' vs `__noinline' on
 the definition in the source code, but I'm talking about the
 differences in the generated code, not the differences in the source
 code.
 
 If you take the _non-working_ intermediate .s file with inline __rfs
 in the source code, and insert nops where the _working_ one has nops
 after lwc1, and then assemble and link it all, does that result work?
 
 Something else to try: assert that fegetround() returned FE_TONEAREST.
 Nothing is linked against libm in your test cases, so nothing should
 be changing the rounding mode, right?  So it should always return
 FE_TONEAREST.  If the assertion fails, that will suggest the machine
 state is set up in correctly or we're misusing cfc1 somehow; if the
 assertion passes, perhaps the __rfs/cfc1/fegetround business is a red
 herring, and it's actually a problem with some other part of the code
 (or with the compiler's code generation).
 


Home | Main Index | Thread Index | Old Index