tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: using the interfaces in ctype.h

> On 21-Apr-08, at 1:57 PM, Joerg Sonnenberger wrote:

 >> On Mon, Apr 21, 2008 at 01:52:41PM -0400, Greg A. Woods; Planix,
 >> Inc. wrote:
 >>> Actually, no, it doesn't, at least on NetBSD.   Try it!  :-)
 >> Sure. See attached code.

> You must have some different release of NetBSD than any I have.  I get
> "0 0" from your program on stock 4.0, a netbsd-4 branch ("4.0_STABLE")
> system, and of course on 1.6.2 too.  Same on Mac OS X 10.5.2 as well.
I didn't trace entire discussion. But I have one note.
The code below is totally wrong

  char a = ...


  char a = ...

People speaking Slavic languages such as Russian, Belarusian and Ukrainian
know this VERY WELL. This happens because both mostly used charsets
(KOI8-R/KOI8-U and CP-1251) assign a letter to the code 255.
KOI8-R - upper-case SHORT_I, CP1251 - lower-case YA.

The code above is wrong because, for example,
toupper((char) lower_case_ya_letter) returns lower_case_ya_letter, not
isalpha((char) lower_case_ya_letter) returns 0 (false) etc.
All this is because tolower(EOF) == toupper(EOF) == EOF, EOF == -1
and is*(EOF) == 0.

There are LOTS of programs with this type of issues.

Another problem is that this problem is NOT seen on Linux.
because heir to* and is* functions
work "correctly" with negative values in range [-128..-2].
As a result those who live with iso-8859-* locales
do not see this problem. These charsets just do not define 0xFF symbol.

In order to notify developers about this problem to* and is* functions
should work like this

int toupper (int c)
  assert(c == EOF || c >= 0 && c <= UCHAR_MAX);

Best regards, Aleksey Cheusov.

Home | Main Index | Thread Index | Old Index