Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Casting ctype lookups



On Wed, Nov 14, 2012 at 11:03:25AM -0500, Greg Troxel wrote:
 > > I believe that there's nothing wrong with NetBSD's definitions of
 > > these functions and macros.  If the caller gets a warning about them,
 > > then there's a problem in the caller's code.
 > 
 > That may be true technically, but there's a larger social problem that
 > there's a lot of code out there that seems to work on other systems, and
 > doesn't produce warnings, but does produce warnings on NetBSD, and
 > people generally don't understand all the subtleties.   So I think it
 > would be really helpful if there were a public language-lawyerly defense
 > of why the warning is legitimate (i.e., indicates accurately that the
 > code being warned about is probably wrong), and perhaps ctype(3) is the
 > place.

From ctype(3):

CAVEATS
     The first argument of these functions is of type int, but only a very
     restricted subset of values are actually valid.  The argument must either
     be the value of the macro EOF (which has a negative value), or must be a
     non-negative value within the range representable as unsigned char.
     Passing invalid values leads to undefined behavior.

     Values of type int that were returned by getc(3), fgetc(3), and similar
     functions or macros are already in the correct range, and may be safely
     passed to these ctype functions without any casts.

     Values of type char or signed char must first be cast to unsigned char,
     to ensure that the values are within the correct range.  The result
     should then be cast to int to avoid warnings from some compilers.  Cast-
     ing a negative-valued char or signed char directly to int will produce a
     negative-valued int, which will be outside the range of allowed values
     (unless it happens to be equal to EOF, but even that would not give the
     desired result).

 > I wonder if there's a way to have a macro definition that checks the
 > type of the argument at compile time, and if it's unsigned char or int,
 > produce the current code, and if it's signed char, add a test for being
 > non-negative with an abort.

This is a start:

#define toupper(ch) ((int) \
    (sizeof(typeof(ch)) > 1 ? (_toupper_tab_ + 1)[(ch)] :               \
     ((typeof(ch))-1) > 0 ? (_toupper_tab_ + 1)[(unsigned)(ch)] :       \
     (ch) + 1 >= 0 ? (_toupper_tab_ + 1)[(ch)] :                        \
     (__assert13(__FILE__, __LINE__, "toupper", #ch " >= -1"), '~')) )

but it only works for compilers that provide typeof, and it may behave
undesirably on platforms where sizeof(int) is 1.

It still won't protect against people who don't know what they're
doing and change toupper(ch) to toupper((int)ch). But I'm not sure
there's any practical way to do that.

If anyone wants to try improving on it, here's a test harness:

   ------
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <ctype.h>

#undef toupper

#define toupper(ch) ...

#ifndef T
#define T char
#endif

static void test(int ch_in) {
   T ch_test;
   int ch_out;

   printf("trying %d (%c):\n", ch_in, ch_in);

   ch_test = ch_in;
   ch_out = toupper(ch_test);

   printf("got %d (%c).\n", ch_out, ch_out);
}

int main(int argc, char *argv[]) {
   int i;

   for (i=1; i<argc; i++) {
      test(atoi(argv[i]));
   }
   return 0;
}
   ------

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index