Subject: Re: expr.c: large structure problems
To: Herman ten Brugge <Haj.Ten.Brugge@net.HCC.nl>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: tech-toolchain
Date: 10/27/1998 14:30:00
Hello,

>I debugged the code and found the problem in the mips.h file. If I compile
>the program with '-mdebuga bug.c' it works fine. It seems the macro
>GO_IF_LEGITIMATE_ADDRESS is not correct. 

It looks to me like it acutally does the addresses its supposed to:
but that the optimization Castor found is smashing the RTL into
something for which that the backend can't find an output template.

the fragment which tickles the bug gets the address of a struct and
wants to index into the second field of the struct.


The first field is > 36767 bytes long.  the Mips architecture only
supports 16-bit immediate offsets. the comment from
GO_IF_LEGITIMATE_ADDRESS explains whats going on:


	  /* For some code sequences, you actually get better code by	\
	     pretending that the MIPS supports an address mode of a	\
	     constant address + a register, even though the real	\
	     machine doesn't support it.  This is because the		\
	     assembler can use $r1 to load just the high 16 bits, add	\
	     in the register, and fold the low 16 bits into the memory	\
	     reference, whereas the compiler generates a 4 instruction	\
	     sequence.  On the other hand, CSE is not as effective.	\
	     It would be a win to generate the lui directly, but the	\
	     MIPS assembler does not have syntax to generate the	\
	     appropriate relocation.  */				\

which is correctn: GO_IF_LEGITIMATE_ADDRESS is accepting an address
[qwhich is (reg) + <offset > 16 bits), on the assumption that the
assembler will handle it.  Then the cse `exposing' breaks up the RTL
into something where the constant is no longer in a memory-ref.  So
the assembler magic for loads and stores can no longer fix it up, the
backend has no patterns that match the 32-bit constant, and thus the
compiler coredumps.

After applying Castor's patch I get the assembler output:

	addu	$4,$4,$5		# 12 addsi3_internal
	li	$2,13			# 0xd  # 14 movqi_internal2/2
	sb	$2,32800($4)		# 16 movqi_internal2/6
	j	$31			# 26 return


The CPU cannot do the `sb $2, 32800($4)' in one machine instruction:
the offset 32800 is outside the range -32768..32767 of a 16-bit
immediate operand (offset or otherwise).

But the assembler wil synthesize it in two instructions, using lui or
'load upper immediate' to loads a 16-bit constant immediate operand
into the high-order 16 bits of a register; adding that to the desired
offset register (or a new temporary); and then negating the actual
used in the memory-reference instruction so that when added to
(constant shifted up 16 bits) it ends up at the right address:

the assembly generated (my comments) is

   0:   00852021        addu    $a0,$a0,$a1	# add first two args
   4:   2402000d        li      $v0,13		# load constant offset 13
   8:   3c010001        lui     $at,0x1		# load $at with (1 << 16)
   c:   00240821        addu    $at,$at,$a0	# add to frist two args
  10:   03e00008        jr      $ra		# delayed-branch  return
  14:   a0228020        sb      $v0,-32736($at) # store into
						# (a0 + a1) + (1 << 16) +
						   (0x10000 - 3800)

the assembler does this automagically, behind the compiler's, back for
loads or stores with an immediate offset bigger than 16 bits.  (and
also for loads or stores of 32-bit constants). The register "$at" is
reserved for assembler-generated temporaries like this.

> ...
> dont think I don't know how to fix this
>because I don't know how the mips works. 

I've tried to explain how the mips works here.  The mips.md templates
for the mips have constrant alternatives which know about this case:
moves from 'd' to 'R' which costs 1, versus 'd' to 'm' which costs 2
(eg., alternatives 4 and 6 in moveqi_internal2).

I think the problem is that the `unfolding' in expr.c breaks up the
address calculation into something which the backend no longer
recognizes as an address calculation -- _after_
GO_IF_LEGITIMATE_ADDRESS said it was okay, and after the breakup, the
RTL is no longer an address.

Later, the backend cannot find an output template which matches the
assembler magic, since

(insn 14 12 16 (set (reg:SI 81)
        (plus:SI (reg:SI 80)
            (const_int 32800))) -1 (nil)
    (nil))

is no longer in a memory-address context.  

That really is illegal on mips. it needs 3 insns: a 16-bit load
immediate, a load-upper-immediate to load the 17-bit constant; and an
add.  Plus the byte-store, for a total of 4 machine insns, versus two
the `old' way.  Gross.

In a perfect world, maybe the right answer is to change gcc to emit
the lui magic itself; the comments say that the MIPSco assemblers dont
have syntax to let user code emit the insns with the appropriate
relocation. ``Blast''.  (The assembler does not have any way to
mask-to-16-bits-and-negate that works on assembly-time constants, eg
label expressions: but if the "constants" here are really actual
numeric values, i dont see why we cant just emit the appropriate
values in the compiler.)

But that would still lose if there were less than 3 references which
used the constant.  we'd need to add patterns to push the `exposed'
sub-expressions back into an immediate-address pattern.  Yuck.
I would sooner just not expose them in the first place.

There are obviously several ways to fix this: I dont know the goals of
the egcs project well enough to pick the right one. For now I'm using
Castors patch to back out the relevant change to expr.c.


Compiling with -O, -O2, -O3 also
>did fix this problem.

Uh, no, not for me.  I was compiling with -O2 already.