I very much agree that pointer arithmetic MUST NOT be "undefined", even
if it includes "NULL" and/or "0". The warning that begat this thread is
insane!
Note I say this as someone who is very empathetic to implementers who
might try to make C work in any strange hardware systems where "null"
pointers are not actually all zeros in the hardware. I hope to be one.
At Mon, 24 Feb 2020 14:41:26 +0100, Kamil Rytarowski <n54%gmx.com@localhost> wrote:
Subject: Re: NULL pointer arithmetic issues
>
> Please join the C committee as a voting member or at least submit papers
> with language changes. Complaining here won't change anything.
>
> (Out of people in the discussion, I am involved in wg14 discussions and
> submit papers.)
If you are active on the wg14 committee, perhaps you can be convinced to
argue on "our" behalf? [0.5 :-)]
I wrote the following rant some time ago and posted it somewhere
(probably on G+ because I don't find it now with a quick search).
I'll throw it in here for some more fuel....
NO MORE "undefined behaviour"!!! Pick something sane and stick to it!
The problem with modern "Standard" C is that instead of refining the
definition of the abstract machine to match the most common and/or
logical behaviour of existing implementations, the standards committee
chose to throw the baby out with the bath water and make whole swaths
of conditions into so-called "undefined behaviour" conditions.
An excellent example are the data-flow optimizations that are now
commonly abused to elide security/safety-sensitive code:
int
foo(struct bar *p)
{
char *lp = p->s;
if (p == NULL || lp == NULL) {
return -1;
}
lp[0] = '\0';
return 0;
}
Any programmer worth their salt will assume the compiler can calculate
the offset of 's' at compile time and thus anyone ignorant of C's new
"undefined behaviour" rules will guess that at worst some location on
the stack will be assigned a value pulled from low memory (if that
doesn't cause a SIGSEGV), but more likely the de-reference of 'p'
won't happen right away because we all know that any optimizer worth
it's salt SHOULD defer it until the first use of 'lp', perhaps not
even allocating any stack space for 'lp' at all!
Worse yet this example stems from actual Linux kernel code like this:
static int
podhd_try_init(struct usb_interface *interface,
struct usb_line6_podhd *podhd)
{
struct usb_line6 *line6 = &podhd->line6;
if ((interface == NULL) || (podhd == NULL))
return ENODEV;
....
}
Here some language-lawyer-wannabees [[LLWs]] might try in vain to argue over
the interpretation of "dereferencing", yet again any programmer worth
their salt knows that the address of an field in a struct is simply
the sum of the struct's base address and the offset of the field, the
latter of which the compiler obviously knows at compile time, and
adding a value to a NULL pointer should never be considered invalid or
undefined!
[[ You have to start from somewhere, after all.... Why not zero? ]]
(I suspect the LLWs are being misled by the congruence between "a->b"
and "(*a).b".)
Worst of all consider this example:
void *
foo(struct bar *p)
{
size_t o = offsetof(p, s);
if (s == NULL)
return NULL;
....
}
And then consider an extremely common example of "offsetof()" which
might very well appear in a legacy application's own code because it
pre-dated <stddef.h>, though indeed this very definition has been used
in <stddef.h> by several standard compiler implementations, and indeed
it was specifically allowed in general by ISO C90 (and only more
recently denied by C11, sort of):
#define offsetof(type, member) ((size_t)(unsigned long)(&((type *)0)->member))
or possibly (for those who know that pointers are not always "just"
integers):
#define offsetof(type, member) ((size_t)(unsigned long)((&((type *)0)->member) - (type *)0))
Here we have very effectively and entirely hidden the fact that the
'->' operator is used with 's'.
Any sane person with some understanding of programming languages
should agree that it is wrong to assume that calculating the address
of an lvalue "evaluates" that lvalue. In C the '->' and '[]'
operators are arithmetic operators, not (immediately and on their own)
memory access operators.
Sadly C's new undefined behaviour rules as interpreted by some
compiler maintainers now allow the compiler to STUPIDLY assume that
since the programmer has knowingly put a supposed de-reference of a
pointer on the first line of the function, then any comparisons of
that pointer with NULL further on are OBVIOUSLY never ever going to be
true and so it can SILENTLY wipe out the whole damn security check.
I guess I'm saying that modern compiler maintainers are not sane, and
at least some of the more recent C Standards Committee are definitely
NOT sane and/or friendly and considerate.
C's primitive nature engenders the programmer to think in terms of
what the target machine is going to do, and as such it is extremely
sad and disheartening that the standards committee chose to endanger
users in so many ways.
[[ in modern "Standard C" ]]
Itʼs not that evaluating something like (1<<32) might have an
unpredictable result, but rather that the entire execution of any
program that evaluates such an expression is ENTIRELY meaningless!
Indeed according to "Standard C" the execution is not even meaningful
up to the point where undefined behaviour is encountered. Undefined
behaviour trumps ALL other behaviors of the C abstract machine.
And it is all in the goal of attempting comprehensive maximum possible
optimization of all code at any expense INCLUDING correct operation of
the program.
Not all so-called "undefined behaviours" are quite this bad, yet, but
in general we would be infinitely better off with a more completely
defined abstract machine that might force some target architectures to
jump through hoops instead of forcing EVERY programmer to ALWAYS be
more careful than EVERY conceivable optimizer.
As Phil Pennock said:
If I program in C, I need to defend against the compiler maintainers.
[[ and future standards committee members!!! ]]
If I program in Go, the language maintainers defend me from my mistakes.
And I say:
Modern "Standard C" is actually "Useless C" and "Unusable C"
Indeed I now say if "Standard C" follows C++ then it will be safe to say
that a good optimizing compiler will soon be able to turn all C programs
into "abort()" calls.
--
Greg A. Woods <gwoods%acm.org@localhost>
Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpSk96TYNFqf.pgp
Description: OpenPGP Digital Signature