Subject: Re: toolchain/22118: make won't compile with -Wcast-qual -Wstrict-prototypes and more
To: Ben Harris <bjh21@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-toolchain
Date: 07/17/2003 16:11:39
[ On Wednesday, July 16, 2003 at 23:32:07 (+0100), Ben Harris wrote: ]
> Subject: Re: toolchain/22118: make won't compile with -Wcast-qual -Wstrict-prototypes and more
>
> ISO/IEC 9899:1990 (section 6.5.7) says:
> 
> # An array of character type may be initialized by a character string
> # literal, optionally enclosed in braces.  Successive characters of the
> # character string literal (including the terminating null character if
> # there is room or if the array is of unknown size) initialize the elements
> # of the array.
> 
> If this isn't clear enough, example 7 in that section says:
> 
> # The declaration
> #       char s[] = "abc", t[3] = "abc";
> # defines "plain" char array objects s and t whosee elements are initialized
> # with character string literals.  This declaration is identical to
> #       char s[] = { 'a', 'b', 'c', '\0' },
> #            t[] = { 'a', 'b', 'c' };
> 
> which pretty clearly agrees with der Mouse.

Ah, well it seems the above does agree in part with GCC.  This means GCC
is indeed correct and compliant and that I was wrong in assuming GCC was
the odd one out in doing this.  It also means that using "char foo[]" is
the correct and hopefully portable way to safely "de-const" a string
literal.  I for one will from now on use this technique to safely
initialize "pointers" that need to be used along code paths that assign
or pass their value to non-const char pointers (even though it will make
my code harder for David Laight to read :-).

Thanks for researching and clarifying the standards P.O.V. on this part
of the puzzle.

However the other related question remains:  What does the C standard
say is default implied type of a string literal when used as an
initializer for a pointer type (as opposed to the array type above)?

If I understand correctly the only thing you said with respect to this
question was:

>  A string literal forms an array of static storage duration
> with elements of type char or wchar_t.  const isn't mentioned in the section
> on string literals at all.

However this would mean that ISO/IEC 9899:1990 does not agree with GCC
in practice (i.e. by default and then -fwritable-strings is not used)
and it would also mean that these three examples are not identical
w.r.t. the "const" qualifiers and thsu could not benefit from the
compile-time checks enabled by "-Wwrite-strings -Wcast-qual":

  {                          {                     char *foo;
    char *foo = "string";      char *foo;          bar(char *tmp)
  }                            foo = "string";     {
                             }                       foo = tmp;
                                                   }
                                                   {
                                                     bar("string");
                                                   }

It would seem to me that GCC is far closer to what should be considered
correct.  Magically changing the way a string literal is interpreted
when it is used in the place of a "char *" initializer as opposed to
when it is used in place of an expression term with the type "char *"
would not be very useful.

My own experience with analyzing runtime bugs that I have found and
fixed in code which mis-used string constants shows that if the compiler
implements read-only and/or shared storage for string constants (as GCC
does by default) then the only way to catch these bugs at compile time
is to use "const char *" as the default implied type for string
constants (except, it seems, when they are used as array initializers).

I.e. in practice portable code must assume that string constants, even
when used as "char *" initializers (but also when used in expressions),
have an implied "const" qualifier (i.e. the type "const char[len]") when
they are stored in read-only storage and/or when they are combined with
other identical instances.

> > Secondly I read nothing in the above which implies in any way that the
> > storage for the string constant used as an array initializer cannot ever
> > be read-only.
> 
> From the point of view of the C program, initializers don't have storage at
> all, since there's no way to take their addresses, and no way to assign to
> them.

Yes, well, OK, but I think you got my point.

(You can implicitly, and in a non-portable way, get the address of a
string literal used as an intializer IFF your compiler uses shared
addresses for identical string literals.  :-)

>  Only the things that they initialise have storage, and those are
> writable if they're not declared "const".

I'm not a compiler guru, but I do know from reading compiler generated
assembly code that an average C compiler implementation will take the
string literal used in an initializer statement and put it in the data
segment and then it will (if the variable is global) store the address
of that string constant in the variable being initialized (i.e. at the
data segment address for that variable).

If the compiler and linker implement a .rodata segment and uses it for
string literal then in fact that string literal will be stored in
read-only storage and it won't matter how the pointer variable being
initialized is declared because it will point at read-only storage.

GCC, for example seems to do this.  It does not by default put string
literals in writable storage just because the pointer variables they're
used as initializers for have the type "char *" (i.e. are not declared
"const") -- in fact GCC by default (i.e. unless you use
"-fwritable-strings") stores string literals in read-only storage and also
gives all string constants the implied "const" qualifier, regardless of
whether they are used in initializers for "char *" variables or they are
used in an expression.  As a result GCC will, with "-Wwrite-strings
-Wcast-qual", warn that the implied "const" of the string has been
discarded in an initializer statement for a "char *" variable and if the
code does in fact try to write through that pointer to the string
literal it will fault (i.e. the warning is one that should be taken very
seriously).

So, perhaps what you say _should_ be true (i.e. that the "const" status
of the string literal in an initializer should be derived from the type
of the pointer variable it is assigned to), but it's clearly not true
for one of the most widely used C compilers.  It would cause problems
when porting code to GCC unless one turned on '-fwritable-strings".  I'm
sure it would also be somewhat surprising to many programmers, but
perhaps no more so than the transformation of string literals when used
as array initializers.

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>