tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NULL pointer arithmetic issues



I very much agree that pointer arithmetic MUST NOT be "undefined", even
if it includes "NULL" and/or "0".  The warning that begat this thread is
insane!

Note I say this as someone who is very empathetic to implementers who
might try to make C work in any strange hardware systems where "null"
pointers are not actually all zeros in the hardware.  I hope to be one.

At Mon, 24 Feb 2020 14:41:26 +0100, Kamil Rytarowski <n54%gmx.com@localhost> wrote:
Subject: Re: NULL pointer arithmetic issues
> 
> Please join the C committee as a voting member or at least submit papers
> with language changes. Complaining here won't change anything.
> 
> (Out of people in the discussion, I am involved in wg14 discussions and
> submit papers.)

If you are active on the wg14 committee, perhaps you can be convinced to
argue on "our" behalf?   [0.5 :-)]

I wrote the following rant some time ago and posted it somewhere
(probably on G+ because I don't find it now with a quick search).

I'll throw it in here for some more fuel....

  NO MORE "undefined behaviour"!!!  Pick something sane and stick to it!

  The problem with modern "Standard" C is that instead of refining the
  definition of the abstract machine to match the most common and/or
  logical behaviour of existing implementations, the standards committee
  chose to throw the baby out with the bath water and make whole swaths
  of conditions into so-called "undefined behaviour" conditions.

  An excellent example are the data-flow optimizations that are now
  commonly abused to elide security/safety-sensitive code:

	int
	foo(struct bar *p)
	{
		char *lp = p->s;

		if (p == NULL || lp == NULL) {
			return -1;
		}
		lp[0] = '\0';

		return 0;
	}

  Any programmer worth their salt will assume the compiler can calculate
  the offset of 's' at compile time and thus anyone ignorant of C's new
  "undefined behaviour" rules will guess that at worst some location on
  the stack will be assigned a value pulled from low memory (if that
  doesn't cause a SIGSEGV), but more likely the de-reference of 'p'
  won't happen right away because we all know that any optimizer worth
  it's salt SHOULD defer it until the first use of 'lp', perhaps not
  even allocating any stack space for 'lp' at all!

  Worse yet this example stems from actual Linux kernel code like this:

	static int
	podhd_try_init(struct usb_interface *interface,
	               struct usb_line6_podhd *podhd)
	{
		struct usb_line6 *line6 = &podhd->line6;

		if ((interface == NULL) || (podhd == NULL))
			return ENODEV;
		....
	}

  Here some language-lawyer-wannabees [[LLWs]] might try in vain to argue over
  the interpretation of "dereferencing", yet again any programmer worth
  their salt knows that the address of an field in a struct is simply
  the sum of the struct's base address and the offset of the field, the
  latter of which the compiler obviously knows at compile time, and
  adding a value to a NULL pointer should never be considered invalid or
  undefined!

[[ You have to start from somewhere, after all....  Why not zero? ]]

  (I suspect the LLWs are being misled by the congruence between "a->b"
  and "(*a).b".)

  Worst of all consider this example:

	void *
	foo(struct bar *p)
	{
		size_t o = offsetof(p, s);

		if (s == NULL)
			return NULL;
		....
	}

  And then consider an extremely common example of "offsetof()" which
  might very well appear in a legacy application's own code because it
  pre-dated <stddef.h>, though indeed this very definition has been used
  in <stddef.h> by several standard compiler implementations, and indeed
  it was specifically allowed in general by ISO C90 (and only more
  recently denied by C11, sort of):

	#define	offsetof(type, member)	((size_t)(unsigned long)(&((type *)0)->member))

  or possibly (for those who know that pointers are not always "just"
  integers):

	#define	offsetof(type, member)	((size_t)(unsigned long)((&((type *)0)->member) - (type *)0))

  Here we have very effectively and entirely hidden the fact that the
  '->' operator is used with 's'.

  Any sane person with some understanding of programming languages
  should agree that it is wrong to assume that calculating the address
  of an lvalue "evaluates" that lvalue.  In C the '->' and '[]'
  operators are arithmetic operators, not (immediately and on their own)
  memory access operators.

  Sadly C's new undefined behaviour rules as interpreted by some
  compiler maintainers now allow the compiler to STUPIDLY assume that
  since the programmer has knowingly put a supposed de-reference of a
  pointer on the first line of the function, then any comparisons of
  that pointer with NULL further on are OBVIOUSLY never ever going to be
  true and so it can SILENTLY wipe out the whole damn security check.

  I guess I'm saying that modern compiler maintainers are not sane, and
  at least some of the more recent C Standards Committee are definitely
  NOT sane and/or friendly and considerate.

  C's primitive nature engenders the programmer to think in terms of
  what the target machine is going to do, and as such it is extremely
  sad and disheartening that the standards committee chose to endanger
  users in so many ways.

[[ in modern "Standard C" ]]
  Itʼs not that evaluating something like (1<<32) might have an
  unpredictable result, but rather that the entire execution of any
  program that evaluates such an expression is ENTIRELY meaningless!
  Indeed according to "Standard C" the execution is not even meaningful
  up to the point where undefined behaviour is encountered.  Undefined
  behaviour trumps ALL other behaviors of the C abstract machine.

  And it is all in the goal of attempting comprehensive maximum possible
  optimization of all code at any expense INCLUDING correct operation of
  the program.

  Not all so-called "undefined behaviours" are quite this bad, yet, but
  in general we would be infinitely better off with a more completely
  defined abstract machine that might force some target architectures to
  jump through hoops instead of forcing EVERY programmer to ALWAYS be
  more careful than EVERY conceivable optimizer.

  As Phil Pennock said:

    If I program in C, I need to defend against the compiler maintainers.
        [[ and future standards committee members!!! ]]
    If I program in Go, the language maintainers defend me from my mistakes.

  And I say:

	Modern "Standard C" is actually "Useless C" and "Unusable C"


Indeed I now say if "Standard C" follows C++ then it will be safe to say
that a good optimizing compiler will soon be able to turn all C programs
into "abort()" calls.

-- 
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpSk96TYNFqf.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index