port-pc532: Re: Changes for GCC's ns32k.md file

Subject: Re: Changes for GCC's ns32k.md file
To: David Seifert <seifert@sequent.com>
From: Jon Buller <jonb@metronet.com>
List: port-pc532
Date: 03/11/1996 21:02:49
Well, I've hit another snag in getting quads into ns32k.md...

Adding the definition:

(define_insn "subdi3"
  [(set (match_operand:DI 0 "general_operand" "=r")
	(minus:DI (match_operand:DI 1 "general_operand" "%0")
		  (match_operand:DI 2 "general_operand" "r")))]
  ""
  "*
  {
	output_asm_insn (\"subd %2,%0\", operands);
	operands[0] = gen_rtx (REG, SImode, REGNO (operands[0]) + 1);
	operands[2] = gen_rtx (REG, SImode, REGNO (operands[2]) + 1);
	return \"subcd %2,%0\";
}")

will cause this code:

long long sub (long long a, long long b) { return (a - b); }

to generate this (as expected):

	enter [r3],0
	movd 8(fp),r0
	movd 12(fp),r1
	movd 16(fp),r2
	movd 20(fp),r3
	subd r2,r0
	subcd r3,r1
	exit [r3]
	ret 0

As I said previously, I would like to get rid of the r2,r3 usage,
adding the memory straight into the result registers, and adding
memory to memory using no registers if possible, but I am not sure
how to do that yet.  The suggestion was made to contact RMS, which
I plan to do, but I don't want to waste his time with stupid
questions, so I woll poke around a bit more to see if I can get
any farther (which I doubt).

The problem with the above definition though is that it is not used
as often as would be desired, since:

long long neg (long long x) { return (-x); } generates:

	enter [],0
	movd 8(fp),r0
	movd 12(fp),r1
	movd r1,tos
	movd r0,tos
	bsr ___negdi2
	cmpd tos,tos
	exit []
	ret 0

long long oneminusx (long long x) { return (1 - x); } generates:

	enter [r3,r4],0
	movd 8(fp),r2
	movd 12(fp),r3
	movqd 1,r0
	subd r2,r0
	cmpqd 1,r0
	slod r4
	negd r3,r1
	subd r4,r1
	exit [r4,r3]
	ret 0

and long long xminus2 (long long x) { return (x - 1); } generates:

	enter [r3,r4],0
	movd 8(fp),r2
	movd 12(fp),r3
	addr -1(r2),r0
	cmpd r0,r2
	slod r4
	addr -1(r3),r1
	addd r4,r1
	exit [r4,r3]
	ret 0

Both of these cases could be sped up dramatically (I think) if they
would use the subdi3 insn instead of whatever they are generating.
I guess that gcc is guessing the wrong cost for using whatever vs
subdi3 in the cases where one of the arguments is a constant, but
I don't know where that information is in the compiler sources.

I'm just a lowly telecom programmer, who is going to grad school
at nights, and has never played with the innards of GCC.  And I
just finished the only compiler class offered.  It wasn't much help
for this either, 14&1/2 weeks on parsing, 1 hour on code generation,
1/2 hour on optimization.  I would have prefered a bit more balance...
At least they used the Dragon book, instead of some automata theory
book that another student told me they used to use.

As always, any help, hints, or suggestions would be most appreciated.

Jon Buller

P.S.  Anyone have RMS's prefered email address handy?  I think I'll
be needing it soon, if some new idea doesn't strike me in the face
soon.