'881/'882 instruction usage (Re: Intermediate step for new arch for lc040 compatibility)

To: port-m68k%netbsd.org@localhost
Subject: '881/'882 instruction usage (Re: Intermediate step for new arch for lc040 compatibility)
From: Romain Dolbeau <romain%dolbeau.org@localhost>
Date: Sun, 4 May 2025 14:56:38 +0200

Le dim. 4 mai 2025 à 12:10, Romain Dolbeau <romain%dolbeau.org@localhost> a écrit :
> I wonder how many of the "removed" instructions actually appear in the
> NetBSD distribution (the full base system, excluding pkgsrc)

Turns out, not many of the weird ones but quite a few "common" ones.
I'm far from sure my methodology is correct, but it probably gives a
good indication.

I just un-tar all the sets from NetBSD 10.1/mac68k, and then ran:

####
for X in bin/* usr/bin/* sbin/* usr/sbin/* usr/X11R7/bin/* `find lib
usr/lib usr/X11R7/lib -name '*.so.*'`; do if file $X | grep
"ELF.*m68k" > /dev/null; then /usr/m68k-linux-gnu/bin/objdump -d $X
| tr '\t' ' ' | grep "%fp[0-9]" | sed -e 's/^.*:................ *//'
| awk '{print $1}'; fi; done   | sort | uniq -c | tee all.txt
####

to get the opcodes of instructions using at least one FP register. I
then redid that just for libm, as I expected it to use some of the
removed-in-the-'040 instructions.

There is at least one "false positive", and there could be "false
negative" as well. My data report a single use of ftanb (in xdpyinfo);
that is nonsense, objdump is confused by an indirect branch table and
disassemble briefly at the wrong offset. Presumably this also happen
elsewhere.

At the end, libm seems to be the exclusive user of facosd, fasind,
fatand, fatanhd, fcosd, fcoshd, fetoxd, fetoxm1d, fetoxx, fgetexpx,
fintd, fintx, flog10d, flog2x, flognd, flognp1d, fmodd, fremd,
fscaled, fscalel, fsind, fsinhd, fsqrtd, fsqrts, ftand, ftanhd. The
two fsqrt are supported on the '040. Some instructions are not used at
all, like ftwotox and ftentox.

As for instruction from the '881/'882 but not the '040 that are used
outside libm, they fall into three categories:

(a) fmovecr (move from constant rom), for 0.0 and 1.0 mostly, with
some usage for larger powers of 10
(b) fintrz, to get the integer part rounded to zero
(c) fsgldiv and fsglmul

Now those last two in (c) are intriguing. While "Single-Precision
Divide" and "Single-Precision Multiply" may appear straightforward,
they are surprising as the '881/'882/'040 also have fsdiv/fddiv and
fsmul/fdmul instructions, which are fdiv/fmul but rounding to
single/double instead of whatever is currently the FPU mode. Those
f[sd]div/f[sd]mul are supported in hardware on the '040 (same as
f[sd]add, f[sd]sub, ...), while fsgl{div,mul} are not.
fsgl{mul,div} have a weird behavior: they truncate the mantissa to 24
bits but leave the exponent alone, and output a single-precision
mantissa but full extended exponent.

I'm not sure why they would be used so much (except that they are
somewhat faster on the '881/'882!) as fdiv/fmul with mode set to SP or
fsdiv/fsmul would do the job just as well, and be supported in HW on
the '040/'060. [look through gcc's code...] Mmm, it seems gcc has a
"FP:round_mul" tag which is the "sgl" bit, and that is used
preferentially for '881/'882 (but not '040) for a bunch of patterns
such as:

####
(define_insn "div<mode>3_floatsi_68881"
 [(set (match_operand:FP 0 "nonimmediate_operand" "=f")
       (div:FP (match_operand:FP 1 "general_operand" "0")
               (float:FP (match_operand:SI 2 "general_operand" "dmi"))))]
 "TARGET_68881"
{
 return TARGET_68040
        ? "f<FP:round>div%.l %2,%0"
        : "f<FP:round_mul>div%.l %2,%0";
})
####

I wonder whether the small gain of those instructions on the '020/'030
is worth the penalty on the '040...

The very complex instructions are anecdotal and the overhead of a trap
is probably not much vs. the execution time of the emulated
instruction itself. But between the fsgl* and the common use of fintrz
& fmovecr, I'd recommend anyone with a full '040 to build their pkgsrc
with the appropriate compiler option to avoid those '881/'882-specific
instructions.

Cordially,

-- 
Romain Dolbeau

Prev by Date: Re: Intermediate step for new arch for lc040 compatibility
Next by Date: Re: Intermediate step for new arch for lc040 compatibility
Previous by Thread: Intermediate step for new arch for lc040 compatibility
Indexes:

Home | Main Index | Thread Index | Old Index