tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: using the interfaces in ctype.h



At 10:18 PM 4/20/2008 -0400, der Mouse wrote:
> There are two cases one has to consider for use of isspace() etc.

> 1) if x is an int (wider than a char), and is the result of getc(),
> then x will be in the range [-1, UCHAR_MAX].

No; x will be in the range [0..UCHAR_MAX] or be EOF, which latter
happens to be -1 in our implementation but may be different.  (Unless
you were speaking from a strictly NetBSD perspective, rather than a
correct-use-of-ctype perspective.  Your mention of 1's-complement
machines makes me think not.)

Thank you for pointing that out. I apologize. I had overlooked that EOF is a negative integer, but is not required to be -1.

However, I think that it's still true to say that if x is EOF, isprint(x & UCHAR_MAX) will not (generally) be the same as isprint(x), even though isprint(x & UCHAR_MAX) is always valid. This was my point. (My point was not that (x & UCHAR_MAX) has any particular value.)

I am (per C99) assuming that UCHAR_MAX is one less than a power of two, so that x & UCHAR_MAX is valid and equivalent to (x % (UCHAR_MAX+1)).

So both cases still apply, I think.

> The phrase
>          .. if (isspace((unsigned char) buf[0])) ...
> won't work if isspace() is in-line and there's not enough casting in
> the macro.

I can't see how it could fail.  Could you give an example?

It will fail by generating the warning which prevents compilation with -Werror on some machines. See Greg's other messages -- that's what started this discussion. (Apparently there are some compilers that complain about indexing using (unsigned chars) -- probably those machines on which char is identical to unsigned char, but I'm guessing.)

> I'm running 3.1, so I may have the wrong header files; but this would
> imply that (for example) isspace() should change from
>    ((int)((_ctype_ + 1)[(c)] & _S)
> to
>    ((int)((_ctype_ + 1)[(int)(c)] & _S)

I think this would be a very bad idea.  The existing code draws
warnings from some compiler versions about "array subscript has type
char", which let a coder catch such sloppy code; while this doesn't
apply to 3.1's compiler in my experience, doing it for 3.1 leads to the
idea of doing it for later versions, for which it *does* matter.

It happens for 3.1 and gcc for x86. My point was that I don't have the 4.0 or more recent header files to hand. My point also was that this makes the <ctypeh.h> is...() macros formally at least inconsistent with the C99 definitions (which require int, and for which a (char) argument will silently be widened).

I think we can agree (by looking at C99) that the standard definition of isspace() is 'int isspace(int)'. NetBSD's definition of macros is convenient, but is not mandated by the standard (in fact, the standard does not give special discussion to any of the <ctype.h> functions if implemented as macros).

I can't find a place where C99 requires that any implementation of a function-like macro for a library function be "warning-equivalent" to calling the library function. In other words, C99 does not require that isspace(x) be "warning-equivalent" to (isspace)(x). But I happen to think that it's in the spirit of the specification for isspace(x), even though I agree that doing so may be inconvenient. However, it's more portable, because (isspace)(x) is not likely to give a warning -- and if it does, the warning will be much more like what Coverity might give, e.g. "x is not in {EOF, 0..UCHAR_MAX}", rather than the rather inscrutable gcc message.

As far as I can tell, ultimately it comes down to an implementation choice, as C99 does not give clear guidance.

--Terry

Home | Main Index | Thread Index | Old Index