tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NULL pointer arithmetic issues



At Mon, 24 Feb 2020 22:15:22 -0500 (EST), Mouse <mouse%Rodents-Montreal.ORG@localhost> wrote:
Subject: Re: NULL pointer arithmetic issues
>
> > Greg A. Woods wrote:
> >
> >   NO MORE "undefined behaviour"!!!  Pick something sane and stick to it!
> >
> >   The problem with modern "Standard" C is that instead of refining
> >   the definition of the abstract machine to match the most common
> >   and/or logical behaviour of existing implementations, the standards
> >   committee chose to throw the baby out with the bath water and make
> >   whole swaths of conditions into so-called "undefined behaviour"
> >   conditions.
>
> Unfortunately for your argument, they did this because there are
> "existing implementations" that disagree severely over the points in
> question.

I don't believe that's quite right.

True "Undefined Behaviour" is not usually the explanation for
differences between implementations.  That's normally what the Language
Lawyers call "Implementation Defined" behaviour.

"Undefined behaviour" is used for things like dereferencing a nil
pointer.  There's little disagreement about that being "undefined by
definition" -- even ignoring the Language Lawyers.  We can hopefully
agree upon that even using the original K&R edition's language:

	"C guarantees that no pointer that validly points at data will
	contain zero"

The problem though is that C gives more rope than you might ever think
possible in some situations, such as for example, the chances of
dereferencing a nil pointer with poorly written code.

The worse problem though is when compiler writers, what I'll call
"Optimization Warrior Lawyers", start abusing any and every possible
instance of "Undefined Behaviour" to their advantage.

This is worse than ignoring Hoare's advice -- this is the very epitome
of premature optimization -- this is pure evil.

This is breaking otherwise readable and usable code.

I give you again my example:

> >   An excellent example are the data-flow optimizations that are now
> >   commonly abused to elide security/safety-sensitive code:
>
> > 	int
> > 	foo(struct bar *p)
> > 	{
> > 		char *lp = p->s;
> >
> > 		if (p == NULL || lp == NULL) {
> > 			return -1;
> > 		}
>
> This code is, and always has been, broken; it is accessing p->s before
> it knows that p isn't nil.

How do you know for sure?  How does the compiler know?  Serious questions.

What if all calls to foo() are written as such:

	if (p) foo(p);

I agree this might not be "fail-safe" code, or in any other way
advisable, but it was perfectly fine in the world before UB Optimization
Warriors, however today's "Standard C" gives compilers license to
replace "foo()" with a trap or call to "abort()", etc.

I.e. it takes a real "C Language Lawyer(tm)" to know that past certain
optimization levels the sequence points prevent this from happening.

In the past I could equally assume the optimizer would rewrite the first
bit of foo() as:

	if (! p || ! p->s) return -1;

In 35 years of C programming I've never before had to pay such close
attention to such minute details.  I need tools now to audit old code
for such things, and my current experience to date suggests UBSan is not
up to this task -- i.e. runtime reports are useless (perhaps even with
high-code-coverage unit tests).

This is the main point of my original rant.  "Undefined Behaviour" as it
has been interpreted by Optimization Warriors has given us an unusable
language.

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpZQV0wRHXLE.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index