Subject: Re: Intermediate void* casts
To: Manuel Bouyer <bouyer@antioche.lip6.fr>
From: Martin Husemann <martin@duskware.de>
List: tech-misc
Date: 08/11/2003 00:12:37
On Sun, Aug 10, 2003 at 08:51:01PM +0200, Manuel Bouyer wrote:
> Pardon my ignorance, but could you explain what is a "strict alias violations
> by type-punned pointer dereferences" ? Is it something new with gcc3 ?

Sure, pardoned ;-)

I had to dig a bit myself to find the details and understand the words used.
I guess the basic problem is clear (and old) - pointer aliases are bad and
sometimes hard to track for the compiler. This didn't matter much with older
gcc versions, but the gcc 3.3.1 optimizer is more aggressive about this
with -O2.

The gcc info text explains type-punning and the -fstrict-alias option (that
is included in -O2 now) as:

`-fstrict-aliasing'
     Allows the compiler to assume the strictest aliasing rules
     applicable to the language being compiled.  For C (and C++), this
     activates optimizations based on the type of expressions.  In
     particular, an object of one type is assumed never to reside at
     the same address as an object of a different type, unless the
     types are almost the same.  For example, an `unsigned int' can
     alias an `int', but not a `void*' or a `double'.  A character type
     may alias any other type.

     Pay special attention to code like this:
          union a_union {
            int i;
            double d;
          };
          
          int f() {
            a_union t;
            t.d = 3.0;
            return t.i;
          }
     The practice of reading from a different union member than the one
     most recently written to (called "type-punning") is common.  Even
     with `-fstrict-aliasing', type-punning is allowed, provided the
     memory is accessed through the union type.  So, the code above
     will work as expected.  However, this code might not:
          int f() {
            a_union t;
            int* ip;
            t.d = 3.0;
            ip = &t.i;
            return *ip;
          }

The "almost the same type" (i.e. allowed aliases the compiler is forced to
track) is defined in ISO C like this (if I corelated this correctly):

An object shall have its stored value accessed only by an lvalue expression
that has one of the following types:

 -- a type compatible with the effective type of the object,
 -- a qualified version of a type compatible with the effective type of
    the object,
 -- a type that is the signed or unsigned type corresponding to the
    effective type of the object,
 -- a type that is the signed or unsigned type corresponding to a
    qualified version of the effective type of the object,
 -- an aggregate or union type that includes one of the aforementioned
    types among its members (including, recursively, a member of a
    subaggregate or contained union), or
 -- a character type.

Martin