Subject: Re: Floating point in the kernel
To: Ignatios Souvatzis <is@jocelyn.rhein.de>
From: Andreas Gustafsson <gson@araneus.fi>
List: tech-kern
Date: 09/19/1998 01:28:01
Ignatios Souvatzis <is@jocelyn.rhein.de> said:
> > nowadays it is typical for integer multiplies to be 
> > several times slower than floating point ones.
> 
> 		32bit integer MUL	FP MUL
> MC68060:	2			3
> PPC603:	3			3
>
> its not that bad.

I'm not an expert on either of those two architectures, but I just had
a look at the PPC603 manual and found that

 - The only integer multiply to execute in 3 cycles is an immediate 
   form, other forms need up to 5 cycles
 - Your figures are latencies, not throughputs.  Since the floating
   point unit is pipelined, it can achieve a peak throughput
   of one multiply per cycle
 - The PPC603 has a fused multiply-add instruction, so you can get an 
   addition for free with each floating point multiply

In my Usenet archives, I found the following table of fixed point multiply
latencies and throughputs, posted on comp.arch in 1996 by
John Carr <jfc@mit.edu>:

	Chip		integer multiply issue/latency
			16x32	32x32	32x32->64

	POWER-1		4/4	5/5	6/6
	POWER-2		2/2	2/2	3/3
	PPC 604		1/3	2/4	4/6*
	SuperSPARC	5/5	5/5	6/6
	MIPS R3000	10/10	10/10	11/11* (approx)
	MIPS R5000	3/4	3/4	5/6*
	DEC 21064	20/21	20/21	22/23* (approx)
	DEC 21164	10?	10?	10?

	* = multiply executes in parallel with other integer instructions.

In my opinion, this _is_ bad.  Particularly the Alpha.  They have
improved since then, but then so has floating point performance.
-- 
Andreas Gustafsson, gson@araneus.fi