tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: using the interfaces in ctype.h



[Thanks for the heads up on the MX record]

At 02:20 PM 4/20/2008 -0400, Greg A. Woods; Planix, Inc. wrote:
On 18-Apr-08, at 1:19 PM, Terry Moore wrote:

Hmm.  Back in the bad old days before C-89, casts didn't portably
guarantee that the result of casting was any different than what you
started with.  So for example isspace((uchar) x) could be wrong if x
was > 0xFF.

I remember that the recommended practice (when using Whitesmiths C)
was instead:

      isspace(x & 255)

Indeed!  Some implementations were so lame they didn't include the
mask in the implementation of the macro!

Oh oh, oops, the NetBSD implementations don't seem to include the mask
either!  I didn't realized that!  So sad.  (Which may even mean they
violate the standards implication that they be able to safely accept
the value of EOF.  FreeBSD, OpenBSD, and Darwin all seem to have much
better implementations, though they are all using proper (inline)
functions which makes it easier in some ways to do it right.)

Are you sure that the mask should be included in the macro expansion?

There are two cases one has to consider for use of isspace() etc.

1) if x is an int (wider than a char), and is the result of getc(), then x will be in the range [-1, UCHAR_MAX]. C-99 says that UCHAR_MAX is equal to (2 raised to the power CHAR_BIT) - 1. This means (-1&UCHAR_MAX) is (most likely) UCHAR_MAX -- on a one's complement machine it will be UCHAR_MAX-1.

In any case, masking is wrong, because isspace(-1) is not [generally] equal to isspace(UCHAR_MAX) [or isspace(UCHAR_MAX-1)].

2) On the other hand, x may be a char that has been (implicitly or explicitly) widened to int. For example, char buf[256]; ... if (isspace(buf[0])) ...

In this case, masking is required -- the coder knows that buf[] only contains characters (not the union of {EOF, characters}). Therefore the expression should be
        ... if (isspace(buf[0] & UCHAR_MAX))...
or
        ... if (isspace((int)(unsigned char) buf[0]) ...
or similar.

The phrase
        .. if (isspace((unsigned char) buf[0])) ...
won't work if isspace() is in-line and there's not enough casting in the macro.

The phrase
        ... if ((isspace)((unsigned char) buf[0])) ...
will work, because then the compiler's forced to call the function; and this causes an implicit cast to (int) in order to conform to the prototype, before calling isspace(). [If FreeBSD is using inlines, they may be depending on this behavior.]

Since the compiler can't really know whether the usage is case 1 or case 2, I'm not sure whether it's possible to change things to make them much safer. The main thing may be to ensure that for each implementation of is...(x), the expansion should explicitly cast (x) to (int) each time (x) is used.

I'm running 3.1, so I may have the wrong header files; but this would imply that (for example) isspace() should change from

  ((int)((_ctype_ + 1)[(c)] & _S)

to
  ((int)((_ctype_ + 1)[(int)(c)] & _S)

Otherwise
        isspace(c)
will not be identical to
        (isspace)(c)
because of the missing implicit cast.

Best regards,
--Terry

Home | Main Index | Thread Index | Old Index