Subject: Re: Floating point in the kernel
To: Ignatios Souvatzis <firstname.lastname@example.org>
From: Andreas Gustafsson <email@example.com>
Date: 09/19/1998 01:28:01
Ignatios Souvatzis <firstname.lastname@example.org> said:
> > nowadays it is typical for integer multiplies to be
> > several times slower than floating point ones.
> 32bit integer MUL FP MUL
> MC68060: 2 3
> PPC603: 3 3
> its not that bad.
I'm not an expert on either of those two architectures, but I just had
a look at the PPC603 manual and found that
- The only integer multiply to execute in 3 cycles is an immediate
form, other forms need up to 5 cycles
- Your figures are latencies, not throughputs. Since the floating
point unit is pipelined, it can achieve a peak throughput
of one multiply per cycle
- The PPC603 has a fused multiply-add instruction, so you can get an
addition for free with each floating point multiply
In my Usenet archives, I found the following table of fixed point multiply
latencies and throughputs, posted on comp.arch in 1996 by
John Carr <email@example.com>:
Chip integer multiply issue/latency
16x32 32x32 32x32->64
POWER-1 4/4 5/5 6/6
POWER-2 2/2 2/2 3/3
PPC 604 1/3 2/4 4/6*
SuperSPARC 5/5 5/5 6/6
MIPS R3000 10/10 10/10 11/11* (approx)
MIPS R5000 3/4 3/4 5/6*
DEC 21064 20/21 20/21 22/23* (approx)
DEC 21164 10? 10? 10?
* = multiply executes in parallel with other integer instructions.
In my opinion, this _is_ bad. Particularly the Alpha. They have
improved since then, but then so has floating point performance.
Andreas Gustafsson, firstname.lastname@example.org