Subject: Re: gcc optimizer bug in netbsd-1-6 on alpha (gcc 2.95.3 20010315 (release) (NetBSD nb3))
To: None <tech-toolchain@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-toolchain
Date: 08/16/2003 18:29:15
>> In
>> 	union { int32_t i; struct in_addr a; }
>> there is nothing that guarantees that i and a share any storage at
>> all, as far as I can see.
> The C standard is abstract, but it's not quite that abstract.
> ``Suitably converted'' here means using a type cast.  The definition
> of casting a pointer requires that it be possible to cast back and
> forth between suitably aligned pointers, which means that they must
> have the same value or must have a simple conversion.

Right.  We agree here.

> Note that this is based only on the type of the pointer, not on the
> actual object which the pointer holds the address of.

And here, mostly - the conversion cannot depend on the data stored in
the object, but it can depend on other properties of it (such as, for
example, what kind of memory it's allocated in, on a machine with
different kinds of memory).

> In other words, given
>     union foo { int32_t i; struct in_addr a; } u;
[I added the tag "foo" -dM]
> the standard guarantees that
>     &u.i == (int32_t *) &u.a

That &u.i == (int32_t *)&u, and that &u.a == (struct in_addr *)&u, and
that &u.i = (int32_t *)(union foo *)&u.a, yes, but not, as far as I can
see, that &u.i == (int32_t *)&u.a.

> and further guarantees that
>     foo (&u.i, &u.a)
> will receive two pointers which compare identically after casting.

Yes, provided there is an intermediate (union foo *) cast.  If you can
find wording that demonstrates this intermediate cast unnecessary, I
would love to hear it; it would significantly simplify my abstract
model of unions.

> Your suggestion that one field could be in a register while the other
> was in memory would violate this requirement.

Not if the address of none of the three (the union and its two fields)
is ever taken, as the program cannot then tell the difference.  (Or if
the address of the in-memory field is taken but the compiler can
demonstrate - data flow analysis, perhaps - that that address is never
converted back into a pointer to the type of the purportedly
in-register element, or - if I'm correct above - into a pointer to the
union type.)

> Such an optimization would only be acceptable if there were no way to
> detect it, which basically would require that only one field of the
> union be used.

And something very close to that _is_ required; that only one field of
the union be used at any given time.  Vide infra.

> Well, more precisely, the behaviour of reading from one field of a
> union after writing to another is ``implementation defined,'' [...as
> constratsed with...] ``undefined behaviour,'' [...].

So, the only difference is that an implementation is required to
document - to an unspecified degree of precision - what you get.  I
don't see this as making a significant difference to the utility of a
coding technique in a compiler-independent sense.

A compiler could, for example, document that storing into one member
and reading from a different one returns some value of the type of the
member read, but what that value is may vary arbitrarily from moment to
moment and may or may not bear any relation to the value stored into
the first member.  That would satisfy the documentation requirement and
still permit allocating one member in a register and another in memory.

>> [T]he union is a clumsy but generally workable substitute for the
>> pointer cast...and I don't think it helps the aliasing situation one
>> bit, since struct in_addr cannot alias int32_t (if it could, you
>> could just use the cast).  Thus, the compiler is permitted to assume
>> that storing to the int32_t does not change the struct in_addr.
> I don't think this is true.

Why not?  Do you believe int32_t can alias struct in_addr?  Or do you
believe that unions are exempt from the dictum that a store to an
object may be assumed to not change any other object when the two
objects' types cannot alias one another?  So far I haven't seen any
reason to think either of those is true.

Code that depends on implementation-defined behaviour can, if I
remember my definitions correctly, be conforming but not strictly
conforming; but, again IIRC, the same is true of the pointer-cast
version.

>> Whether today's gcc acts on that assumption I can't say, but there's
>> no guarantee that tomorrow's won't.
> I think that would violate the standard.  It would create a situation
> in which a program would not be able to detect a change which it
> should be able to detect.

Can you elaborate?  How would it violate the standard?  What is the
change you refer to?

> For what it's worth, gcc's documentation clearly states that type
> punning using a union is not affected by aliasing.

Today's documentation.  This means that the union trick will work with
gcc - today's gcc, that is.  That was never in dispute, or at least I
never thought it was.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B