NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb



The following reply was made to PR port-mips/57680; it has been noted by GNATS.

From: Izumi Tsutsui <tsutsui%ceres.dti.ne.jp@localhost>
To: riastradh%NetBSD.org@localhost
Cc: gnats-bugs%netbsd.org@localhost, tsutsui%ceres.dti.ne.jp@localhost
Subject: Re: port-mips/57680: printf("%.1f") shows wrong results on R3000mipseb
Date: Wed, 15 Nov 2023 02:02:30 +0900

 riastradh@ wrote:
 
 > > I wonder how many nops are actually required for the FP coprocessor
 > > (i.e. from/to a different chip) registers..
 > 
 > According to the manual I found at
 > <https://cgi.cse.unsw.edu.au/~cs3231/doc/R3000.pdf#page=103>, only one
 > delay instruction is needed (bottom of p. 8-8):
 > 
 >    The load operation has a delay of one clock, and (like loading to
 >    an integer register) this is not interlocked.  The compiler and/or
 >    assembler will usually take care of this; but it is invalid for an
 >    FP load to be immediately followed by an instruction using the
 >    loaded value.
 > 
 > (I'm not saying this is absolutely true and applicable here -- just
 > sharing it as the only documentation I could find that looks like it
 > might be applicable.)
 
 I've checked "MIPS RISC architecture" by Kane Gerry (Japanese edition)
  https://www.amazon.co.jp/dp/4320025989
 and it says more scheduling is necessary right after LWC1, MTC1, and CTC1,
 but I don't understand details.. (and a bit hard to translate to English)
 
 > > The following diff against dtoa.c reduces outputs a bit
 > > (requires "DBG="-O2 -Wno-error=uninitialized -Wno-error=unused-variable
 > >  -Wno-error=unused-but-set-variable -Wno-error=maybe-uninitialized
 > >  -Wno-error=unused-label"):
 > 
 > Just to be clear: Do you mean that (a) the diff to the .s files shows
 > the difference between non-working printf (without the nops) and
 > working printf (with the nops), and (b) there's nothing else different
 > between non-working vs working printf?  Or did I misunderstand?
 
 In my previous mail,
 
 - dtoa-small-inline-rfs.s is 'objdump -dz' output from dtoa.pico
   built with two #if 0/#endif pairs and the original <mips/fenv.h>
 
 - dtoa-small-noinline-rfs.s is 'objdump -dz' output from dtoa.pico
   built with two #if 0/endif pairs and patched <mips/fenv.h>
   that added "static __noinline" to __rfs()
 
 I've put full asm sources (i.e. no #if 0/#endif pair) by
 objdump -dz (address numbers are manually deleted):
 
  https://gist.github.com/tsutsui/85b03f26aa1bfd3fdd884bce8fd8c1e7
 
 - dtoa-inline-rfs.s 
    dtoa.pico built from the original libc source, i.e. non-working printf
 
 - dtoa-noinline-rfs.s
    dtoa.pico built with __noinline __rfs in <mips/fenv.h>, working printf
 
 - dtoa-inline-noinline-rfs.diff 
    diff between the above two .s files
 
 - fenv.h.diff
    __noinline __rfs() diff used on building dtoa-noinline-rfs.s
 
 I'd say the answers of your two questions both (a) and (b) are "yes".
 
 > Wasn't there a difference about inline vs non-inline __rfs, which
 > should presumably affect where the cfc1 instruction is?
 
 It looks the differences of nops after lwc1 are not relevant to
 cfc1 used in __rfs().
 
 ---
 Izumi Tsutsui
 


Home | Main Index | Thread Index | Old Index