Subject: Re: lib/10010: toupper mangles non-uppercase characters
To: None <firstname.lastname@example.org>
From: Greg A. Woods <email@example.com>
Date: 04/29/2000 14:38:44
[ On Saturday, April 29, 2000 at 00:27:49 (-0400), Adam R. Prato wrote: ]
> Subject: lib/10010: toupper mangles non-uppercase characters
> toupper() returns changed values for short ints other than uppercase
> characters. For example: x8d -> x0d, x83 -> ^M, xa0-xbf -> various characters
Hmmm.... once upon a time, in BSD, the correct usage was only ever:
If I remember correctly the history of this bug stems from the rather
unspecific description of these "functions" in the original edition of
K&R. There's no mention of them in the V7 manuals. So far as I know
these things where always macros up until about AT&T System III was
released (I seem to remember changes in Xenix-III) when they were
renamed with an '_' prefix (_toupper()) and the proper functions,
without a restricted input domain, were were introduced. Looking at my
SysVr2 manuals I see that they are quite explicit about returning the
character unchanged if there's no valid conversion, and that the _to*()
macros would not do this.
Someone was apparently premature in stating the standards conformance
for toupper(3) [and tolower(3)] as far back as 4.3net2 -- and looking at
4.2's ctype.h the "bug" is obviously there.
However I don't understand why you're seeing problems in NetBSD, 1.4.2
especially. This was all fixed in NetBSD back in 1993 (from ctype.h,v):
date: 1993/08/06 22:05:29; author: jtc; state: Exp; lines: +1 -1
Rename tolower & toupper macros to _tolower and _toupper.
Standard C requires tolower to return a character that is !isupper unchanged
which was not being done with the macro. The function version does the
right thing, so the loss of the macro is no great deal.
I didn't eliminate the macros entirely, since X/Open's XPG3 requires _tolower
and _toupper with the same semantics. But, like isascii/toascii, they are
removed from the namespace if either ANSI_SOURCE or _POSIX_SOURCE is defined.
a wee bit later the functions were changed back to (faster) macros using
new lookup tables:
date: 1993/08/06 23:19:51; author: jtc; state: Exp; lines: +1 -1
Declare translation tables for toupper and tolower. To be replaced by
pointers to the tables to the current locale.
Reintroduce toupper and tolower macros that use the translation tables.
If you look in /usr/include/ctype.h you should find the declarations of
#define tolower(c) ((int)((_tolower_tab_ + 1)[(int)(c)]))
#define toupper(c) ((int)((_toupper_tab_ + 1)[(int)(c)]))
and these original style ones:
#define _tolower(c) ((c) - 'A' + 'a')
#define _toupper(c) ((c) - 'a' + 'A')
If your code is really using the first macro (i.e. the one that gets its
return value from the _toupper_tab_) then it should work fine. Check
that your code is using these macros by looking at the output of 'cc -E'
and searching for the place where you call toupper(). Are you doing
anything with a different locale?
In the mean time I'd recommend following Harbison and Steele's advice
for code that has to be portable and always use a wrapper *function*:
Greg A. Woods
+1 416 218-0098 VE3TCP <firstname.lastname@example.org> <robohack!woods>
Planix, Inc. <email@example.com>; Secrets of the Weird <firstname.lastname@example.org>