tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NULL pointer arithmetic issues



> I wrote the following rant some time ago and posted it somewhere

> I'll throw it in here for some more fuel....

>   NO MORE "undefined behaviour"!!!  Pick something sane and stick to it!

>   The problem with modern "Standard" C is that instead of refining
>   the definition of the abstract machine to match the most common
>   and/or logical behaviour of existing implementations, the standards
>   committee chose to throw the baby out with the bath water and make
>   whole swaths of conditions into so-called "undefined behaviour"
>   conditions.

Unfortunately for your argument, they did this because there are
"existing implementations" that disagree severely over the points in
question.  A spec that mandated such things as the "pointers are really
just memory addresses" model you sketch below would, at best, simply
get ignored by implementors on machines that don't match it.  Perhaps
that's what you'd want.  Personally, I prefer the actual choice.

>   An excellent example are the data-flow optimizations that are now
>   commonly abused to elide security/safety-sensitive code:

> 	int
> 	foo(struct bar *p)
> 	{
> 		char *lp = p->s;
> 
> 		if (p == NULL || lp == NULL) {
> 			return -1;
> 		}

This code is, and always has been, broken; it is accessing p->s before
it knows that p isn't nil.  If you're really unlucky you'll be on a
machine where there are device registers at address 0 and you'll poke a
device register with that read.  If you're less lucky you'll be on
MS-DOS or a PDP-11 or some such and silently and harmlessly get a
meaningless value for lp.  If you're lucky you'll get a segfault or
moral equivalent.  Anyone who thinks this sort of sloppiness is
appropriate in security/safety-sensitive code please stay far, far away
from anything that might run on my machines.  Yes, an optimizer _might_
defer the fetch of lp, but it also might not, for any of many reasons;
relying on its doing so is extremely brittle, most definitely not
appropriate for anything security/safety-sensitive.

That said, I do agree that simply dropping the p==NULL check but
preserving the fetch of lp is, if anything, even more broken; it is
gross abuse of the latitude permitted by the undefined-behaviour rules.
But that is a quality-of-implementation issue.

>   Worse yet this example stems from actual Linux kernel code [...]

Good gods.  I'm gladder than ever I don't run Linux.

>   [...], yet again any programmer worth their salt knows that the
>   address of an field in a struct is simply the sum of the struct's
>   base address and the offset of the field, [...]

That's what a mediocre C programmer thinks.  A good one knows there is
a difference between the abstract machine and the implementation and
realizes that, while that is a common implementation, it is far from
the only possible one, and it is inappropriate to rely on it being an
accurate description (except in code not intended to be portable, like
a kernel's pmap layer).

>   Worst of all consider this example:

> 		size_t o = offsetof(p, s);

>   And then consider an extremely common example of "offsetof()" [...]

Such an implementation of offsetof() is nonportable, exactly because it
assumes things like your sketch based on the "pointers are just memory
addresses" model.  Providing it in application code constitutes
nonportable code, just as much as assuming shorts are 18 bits does.
(What, you mean you're not on a 36-bit machine?  What sort of weird
hardware are you using?)

An implementation may provide it, yes, if - IF! - it knows the
associated compiler handles that code such that offsetof() returns the
correct result.  But what would you expect it to do in, say, Zeta-C?
Or do you think Zeta-C should not exist?

>   or possibly (for those who know that pointers are not always "just"
>   integers):

> 	#define	offsetof(type, member)	((size_t)(unsigned long)((&((type *)0)->member) - (type *)0))

That has never worked in C since, oh, I dunno, V7? and probably never
will; it tries to subtract pointers that point to different types.
What I think of as the usual implementation along those lines would be
something like

((size_t)((char *)&((type *)0)->member - (char *)0))

(note the lack of an intermediate cast to unsigned long; size_t may be
wider than unsigned long, though admittedly it's unlikely offsetof()
will need to return a value greater than the largest unsigned long).

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index