Subject: Re: `char -> unsigned char'
To: Todd Vierling <tv@pobox.com>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: current-users
Date: 12/19/1998 13:43:51
> -                       if (*cp == '\\' && isspace(cp[1]))
> +                       if (*cp == '\\' && isspace((unsigned char)cp[1]))

> Why is this?  char/unsigned char stuff was supposed to be taken care of by a
> not very recent change to <ctype.h>.
> If something is going wrong here, the is*() macros need to change, not a
> whole bunch of code.  Think `third party code'....

Having just looked at my draft ANSI C copy (which I think is
up-to-date in this matter), it says the argument to the ctype
functions must either have the value of the macro EOF or be an integer
"representable as an unsigned character".  Technically, a signed
character with a negative value passed to a ctype function *is*
incorrect, and entitles the implementation to misbehave arbitrarily
badly.  And more to the point, a signed char '\377' will generally
have the same value as EOF, which is almost certainly not the intended
behavior.

If the ctype macros are crashing with negative signed characters(*),
that probably ought to be fixed, but the change above really is
correct.  Third party code (or NetBSD code ;-) which passes negative
signed characters to the ctype functions _simply is incorrect_, and
there's no way to make such code behave "correctly" (at least if you
believe you want to distinguish '\377' and EOF).



(*) I doubt the code manages to page fault, but isspace('\200') does
test a location way before the _C_ctype_ array.  I'd suggest that the
ctype arrays be stretched by 128 entries and point the _ctype_ pointer
to _C_ctype[128].  (_C_ctype_ would have 0 entries in 0..127) It
would, however, also be legal and correct behavior to have the ctype
macros do something like

	int isspace(int c)
	{
		if (c != EOF && (c < 0) || (c > 255)) {
			/*
			 * issue a FORMAT UNIT command to all scsi drives
			 * while playing the Blue Danube Waltz on the
			 * speaker.
			 */
			....
			return 69;
		}
		return ((int)((_ctype_ + 1)[(int)(c)] & _S));
	}

(Gotta love that "undefined behavior" clause!)