Subject: Re: Bug in regex library?
To: None <current-users@NetBSD.ORG>
From: Peter Seebach <seebs@solon.com>
List: current-users
Date: 10/29/1996 21:48:30
>Nobody has convinced me that it is a violation of the standard to have
>a type longer than a long, nor even for off_t to be such a type;
>indeed, I don't think I've seen even unsupported claims to that effect.

I'll make a supported one; the integral types are defined to be the
character types, short int, int, long int, and unsigned (short, int,
long int).  The list is exhaustive.  (6.1.2.5.)

I don't have the standard in front of me (!!!) so I can't be sure off_t
is an integral type, but I'm *pretty sure* it is... I know fpos_t isn't
required to be.

>And that's really all it takes to break that code, for regoff_t to be a
>type that doesn't promote to "the same thing" as long int, under the
>default promotion rules.

For instance, if regoff_t were a short, or an int, or a char... or even
(arguably) if it were unsigned long.  (These all *happen* to work on
32-bit NetBSD ports, because everything smaller than int promotes to int,
and int and long are the same size... This is the dependancy old
code used to have, which forced the language to become broken.)

>True.  Also, strictly, if you want to store numbers outside the
>0..65535 range (unsigned) or -32767..32767 [$] range (signed), you have
>to use long anyway.  Or else litter your code with conditionals on
>INT_MIN and INT_MAX (for example), which _nobody_ does, at least in my
>experience.  Therefore, long really should be an efficient type.

You don't have to *litter* your code with them; one or two in a header
should be enough.  (Or rather, are probably too much.)

>[$] I think that's the right range; it's certainly within plus-or-minus
>    two of right.

It is right.  (As a curiosity, one version of Microsoft C defined INT_MIN
to be -32767, not (-32767-1), because (-32768) generated a warning.  This
is arguably incorrect, since -32768 was a legit int value, but you can't
prove it in a strictly conforming program...  I believe more recent versions
are correct.) 

In practice, most of the code which would break if long were 64 bits, and
int and pointers 32, is broken because the programmer was lazy, or because
the programmer didn't realize where the boundaries of language definition
were at the time.  Spencer's 10 commandments cover this, and they're
from a ways back...

-s