Re: NULL pointer arithmetic issues

To: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Subject: Re: NULL pointer arithmetic issues
From: Aaron Ballman <aaron%aaronballman.com@localhost>
Date: Mon, 9 Mar 2020 09:50:50 -0400

On Sun, Mar 8, 2020 at 2:30 PM Taylor R Campbell
<campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>
> > Date: Sun, 8 Mar 2020 20:52:29 +0300
> > From: Roman Lebedev <lebedev.ri%gmail.com@localhost>
> >
> > so we are allowed to lower that in clang front-end as:
> >
> > int
> > statu(char *a)
> > {
> >   __builtin_assume(a != NULL);
> >   a += getuid() - geteuid();
> >   __builtin_assume(a != NULL);
>
> Allowed, yes.
>
> What I'm wondering is whether this is something Clang will actually do
> -- and whether it is for any serious reason other than to capriciously
> screw people who write software for real machines instead of for the
> pure C abstract machine to the letter of the standard (and if so,
> whether -fno-delete-null-pointer-checks is enough to disable it).
>
> Evidently making that assumption _not_ allowed in C++, so presumably
> it's not important for performance purposes.  It's also not important
> for expressive purposes, because I could just as well have written
> assert(a != NULL).
>
> > > I was told by Roman that it was checked during a C committee meeting and
> > > confirmed to be an intentional UB.
> > Correction: Aaron Ballman asked about this in non-public WG14 reflector
> > mailing list, it wasn't a committee meeting, but the point still stands.
>
> What does `intentional' UB mean, exactly?  What is the intent behind
> having p + 0 for null p be undefined in C, while the C++ committee saw
> fit to define it?

Intentional as in: the question was considered and it was decided to
make that scenario explicitly be UB (as opposed to it being UB because
we didn't say anything about it in the standard or never considered
the question in the first place).

> Is it because there technically once existed C implementations on
> bizarre architectures with wacky arithmetic on pointers like Zeta-C,
> or is it because there are actual useful consequences to inferring
> that p must be nonnull if we evaluate p + 0?

As for intent, I can only speculate (I wasn't there for the original
specification work on this. I might not have even been alive for the
original specification work on this.): performance gains. For
instance, a segmented memory architecture may have multiple value
representations of a null pointer. Even a flat memory architecture can
see potential optimization gains from making an assumption that
arithmetic on a pointer means the pointer is valid.

> I ask because in principle a conformant implementation could compile
> the NetBSD kernel into a useless blob that does nothing -- we rely on
> all sorts of behaviour relative to a real physical machine that is not
> defined by the letter of the standard, like inline asm, or converting
> integers from the VM system's virtual address allocator into pointers
> to objects.  But such an implementation would not be useful.

Whether an optimizer elects to use this form of UB to make
optimization decisions is a matter of QoI. My personal feeling is that
I don't trust this sort of optimization -- it takes code the
programmer wrote and makes it behave in a fundamentally different
manner. I'm in support of UBSan diagnosing these constructs because it
is UB and an optimizer is definitely allowed to optimize based on it
but I wouldn't be in support of an optimizer that aggressively
optimizes on this.

~~Aaron

Follow-Ups:
- Re: NULL pointer arithmetic issues
  - From: Joerg Sonnenberger

References:
- Re: NULL pointer arithmetic issues
  - From: Roman Lebedev
- Re: NULL pointer arithmetic issues
  - From: Taylor R Campbell

Prev by Date: modload(8) v.s. alias
Next by Date: Re: NULL pointer arithmetic issues
Previous by Thread: Re: NULL pointer arithmetic issues
Next by Thread: Re: NULL pointer arithmetic issues
Indexes:

Home | Main Index | Thread Index | Old Index