tech-misc: Re: Documentation of abs(3), div(3) etc.

Subject: Re: Documentation of abs(3), div(3) etc.
To: None <tech-misc@netbsd.org>
From: Martijn van Buul <pino@dohd.org>
List: tech-misc
Date: 02/10/2007 19:49:59
* Krister Walfridsson:
> It is.  Integer types must not wrap in C
>
>    ISO/IEC 9899:1999 6.5p5:
>    If an exceptional condition occurs during the evaluation of an
>    expression (that is, if the result is not mathematically defined
>    or not in the range of representable values for its type), the
>    behavior is undefined.

True. Which is the main point exactly - at some point, the abs-ed
INT_MIN *will* be stored in a signed int, which simply doesn't fit, for
the simple reason that the return type of abs() is signed. Nothing
we can do about that, except for changing the return type of 
abs(), which is obviously beyond our control.

> For
>
>    signed int a;
>    return (signed int) ( - (unsigned)a );
>
> on the other hand, you do the operation on an unsigned type, which
> is permitted by the standard.
>
>    ISO/IEC 9899:1999 6.2.5p9:
>    [...] A computation involving unsigned operands can never overflow,
>    because a result that cannot be represented by the resulting unsigned
>    integer type is reduced modulo the number that is one greater than the
>    largest value that can be represented by the resulting type.

I'm having a hard time parsing this - to my best knowledge reducing
"modulo the number that is one greater than the largest value that can be
represented by the resulting type" in effect boils down to "overflowing",
but I assume they intend to say that this means 6.5p5 doesn't apply.

> You also do conversions between possibly unrepresentable values.
> This is completely defined in one direction
>
>    ISO/IEC 9899:1999 6.3.1.3p2:
>    Otherwise, if the new type is unsigned, the value is converted by
>    repeatedly adding or subtracting one more than the maximum value
>    that can be represented in the new type until the value is in the
>    range of the new type.

In case of a conversion from an signed int to an unsigned int on a 
2s-compliment machine, this is a very elaborate way to describe just
copying over data and having an underflow. In case of the INT_MIN case, it
means adding UINT_MAX+1 to it, which doesn't do a very big deal, except 
giving a 'legal' meaning to reinterpreting the signed int as an unsigned int.
In case of casting a signed -1, this means ending up with UINT_MAX, which
incidentally is the exact same bitwise representation. The same applies to any
conversion from a signed int to int - it's an elaborate way of doing nothing.

In the case of a 16-bits machine in the days of yore (in attempt to reduce
on digits, as they don't really matter anyway), this means that 
casting INT_MIN (-32678, bitwise representation 0x8000 to an uint will
yield 32768, 0x8000 - in full compliance with the standard, as it involved
adding 65536 (0xffff) to the signed input.

> and it is implementation-defined in the other direction
>
>    ISO/IEC 9899:1999 6.3.1.3p3:
>    Otherwise, the new type is signed and the value cannot be represented
>    in it; either the result is implementation-defined or an implementation-
>    defined signal is raised.

"implementation-defined" is another way of saying the result is compiler-
specific, thus undefined from a standards point of view - with the hint
that it should be deterministically. IOW: A second invication will also
erase your harddisk.

And it is exactly *this* which makes abs(INT_MIN) yield 'unpredictable' or
'unexpected' behaviour, no matter what implementation you come up with.

> i.e. you need to check the documentation for your compiler, but I think
> that all compilers on the market do what you would naively expect.

Analogous: Doing nothing, but alternatively giving a compiler another
excuse to do strange things ? :)

> And this is not abstract language layering pedantry -- it does affect
> real code as compilers get better at doing value range analysis, whole
> program optimizations, etc., and there have recently been looooong
> discussions on the gcc lists where people have been surprised that gcc
> has eliminated their "wrap-around" tests of the type "if (a+100 < a)"
> (which would not have happened if it was done using unsigned arithmetics).
> See e.g. the gcc bug report
>
>    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475

[ if (a + 100 < a) being optimized away ]

This is a different thing; in this case gcc is right because of 6.3.1.3p3;
the compiler is allowed to implicitly cast "a" to a larger signed integer
type, thus being able to optimize it away as a constant 'false' for all
values of a. A different compiler might react differently, and in fact,
as far as the standard is concerned, this code might have different
results on different platforms.

Which brings us to the point at hand, once again on our beloved 16
bits machine[1]:

Our current code reads:

signed int abs()
{
    signed int a = INT_MIN; // (-37268, 0x8000);
    return -a; // undefined, because of 6.5p5, although returning INT_MIN is
               // a reasonable assumption..
}

The suggested improvement yields:

{
    signed int a = INT_MIN; // (-37268, 0x8000);
    unsigned int b = (unsigned) a; // (37268, 0x8000) Valid, due to 6.3.1.3p2
    unsigned int c = -a; // 32768, 0x8000, due to 6.2.5p9
    return (signed int) c; // 6.3.1.3p3 says that the result will be 
			   // implementation-specific, since 32768 will 
			   // not fit in a signed int.
}

So, we're trading "undefined behaviour" with "implementation-defined 
behaviour, possibly undefined behaviour".

Not sure if this is a real improvement, and it *certainly* doesn't
make things work all of a sudden. There is one thing the implementation-
defined behaviour cannot do: Suddenly make -INT_MIN fito a signed int.

In all likelyness, the result will be the same: After

signed int foo = abs(INT_MIN);

foo will probably end up containing an unchanged INT_MIN - which is
what the manpage is hinting at :)

Martijn

[1] Hey, I happen to like PDP/11s
-- 
Martijn van Buul - pino@dohd.org