Subject: Re: GNU config.guess and netbsd{aout,elf,}
To: Jonathan Stone <jonathan@DSG.Stanford.EDU>
From: Todd Whitesel <toddpw@best.com>
List: tech-toolchain
Date: 11/24/1999 02:30:05
> The reason I'm asking is to educate the IRTF about the portabillity
> issues of code they've been using for, oh, 20 years now.  That code
> has, empirically, been more portable and more reliable than the
> alternatives. Until now.

Ho ho ho. The real reason it's worked for 20 years is because that code is
so ubiquitous, any compiler people who dared to break it were yelled at by
customers promptly, and the decision was reversed. You'll recognize this:

	*((unsigned short *)ptr)++

ANSI says this is illegal: postincrement requires an lvalue (which is the
cast in this case), but the result of a cast is not an lvalue by definition
-- nearly everything else about a cast is implementation defined.

At a previous job, I had to process a bug report against my compiler where
the above code was rejected; someone else had added checks to enforce the
ANSI restrictions. Once we realized it was Berkeley networking code that
produced this case, we knew that we'd lose if we kept the error -- every
customer calling in would insist that it was our bug. So we changed it back.

> Yes, the alternatives are obvious, but the above snippet was
> consciously crafted to avoid knowing about host endian-ness,
> when summing in the last byte of an odd-length Internet packet.

It was also consciously crafted to assume byte-oriented linear-address
machines, which is not an assumption you are allowed if you want to be
portable according to ANSI. While they were being pretty pedantic, they
were within their charter to specify something as rigorous as that.

There are lots of portable ways to accomplish the same result, but they
all require unions, and 20 years ago you wouldn't have gotten a single
move operation with unions. These days, compilers are much better.

Every time I've tried to implement Internet code, I've given up on the
idea of elegantly portable code. Just trying to deal with strictly aligned
RISC chips and ethernet devices that put a 14-byte ethernet header on a
word boundary (forcing the entire IP header to be non-word-aligned), is
enough to make me wonder how anyone can write elegantly portable Internet
code without quite a few abstraction macros. (And before you ask, no I
haven't looked at netinet recently, I gave up on this a long time ago.)

> Myself, I think the most robust answer would be to fix egcs's
> implementation-defined behaviour to be compatible with existing art
> but that may require a significant change to egcs maintainers' outlook:-)

While I have tried arguing the point once (with the g++ maintainer; no dice)
I personally agree in principle with what egcs is doing. In fact, I think it
should have been done AGES ago, so that all the code out there that has been
technically broken since the standard appeared would finally get fixed!

For years the C language was not what K&R first edition said; it was whatever
PCC did in order to build the unix sources. I've maintained code that tried
explicitly to emulate PCC where it differed from K&R, and I know of at least
one difference.

Once the Berkeley networking code "escaped", it became the de facto definition
of many standards, not just C. It tainted IP standards as well (or was that
Sun's fault -- remember the all zero's broadcast address fiasco??).

Sometimes I think that ANSI, ISO, IEEE, and most standards bodies are a waste
of time. We should just insist on a publically available source code base to
use as a reference. Anyone who interoperates with that code base conforms to
the standard, and anyone who doesn't is nonconformant.

After all, that's how it really works in practice, but we lie to ourselves
that the result of a committee process is somehow closer to the truth than
the reference implementation that everyone is actually using, when one exists.

I applaud the IETF procedure of requiring two independently developed
implementations of each standard, but we must recognize that even this
is not 100% foolproof.

Todd Whitesel
toddpw @ best.com