Re: using the interfaces in ctype.h

To: NetBSD Userlevel Technical discussion list <tech-userlevel%NetBSD.org@localhost>
Subject: Re: using the interfaces in ctype.h
From: Aleksey Cheusov <cheusov%tut.by@localhost>
Date: Wed, 23 Apr 2008 00:13:17 +0300

> On 21-Apr-08, at 1:57 PM, Joerg Sonnenberger wrote:

 >> On Mon, Apr 21, 2008 at 01:52:41PM -0400, Greg A. Woods; Planix,
 >> Inc. wrote:
 >>> Actually, no, it doesn't, at least on NetBSD.   Try it!  :-)
 >>
 >> Sure. See attached code.

> You must have some different release of NetBSD than any I have.  I get
> "0 0" from your program on stock 4.0, a netbsd-4 branch ("4.0_STABLE")
> system, and of course on 1.6.2 too.  Same on Mac OS X 10.5.2 as well.
I didn't trace entire discussion. But I have one note.
The code below is totally wrong

  char a = ...
  ...
  is*(a)

or

  char a = ...
  ...
  to{lower,upper}(a)

People speaking Slavic languages such as Russian, Belarusian and Ukrainian
know this VERY WELL. This happens because both mostly used charsets
(KOI8-R/KOI8-U and CP-1251) assign a letter to the code 255.
KOI8-R - upper-case SHORT_I, CP1251 - lower-case YA.

The code above is wrong because, for example,
toupper((char) lower_case_ya_letter) returns lower_case_ya_letter, not
UPPER_CASE_YA_LETTER,
isalpha((char) lower_case_ya_letter) returns 0 (false) etc.
All this is because tolower(EOF) == toupper(EOF) == EOF, EOF == -1
and is*(EOF) == 0.

There are LOTS of programs with this type of issues.

Another problem is that this problem is NOT seen on Linux.
because heir to* and is* functions
work "correctly" with negative values in range [-128..-2].
As a result those who live with iso-8859-* locales
do not see this problem. These charsets just do not define 0xFF symbol.

http://www.opengroup.org/onlinepubs/009695399/functions/tolower.html
http://www.opengroup.org/onlinepubs/009695399/functions/toupper.html
http://www.opengroup.org/onlinepubs/009695399/functions/isalpha.html

In order to notify developers about this problem to* and is* functions
should work like this

int toupper (int c)
{
  assert(c == EOF || c >= 0 && c <= UCHAR_MAX);
  ...
}

-- 
Best regards, Aleksey Cheusov.

Follow-Ups:
- Re: using the interfaces in ctype.h
  - From: Michael van Elst
- Re: using the interfaces in ctype.h
  - From: der Mouse

References:
- Re: using the interfaces in ctype.h
  - From: Christos Zoulas
- Re: using the interfaces in ctype.h
  - From: Greg A. Woods; Planix, Inc.
- Re: using the interfaces in ctype.h
  - From: Terry Moore
- Re: using the interfaces in ctype.h
  - From: Greg A. Woods; Planix, Inc.
- Re: using the interfaces in ctype.h
  - From: Alan Barrett
- Re: using the interfaces in ctype.h
  - From: Greg A. Woods; Planix, Inc.
- Re: using the interfaces in ctype.h
  - From: der Mouse
- Re: using the interfaces in ctype.h
  - From: Greg A. Woods; Planix, Inc.
- Re: using the interfaces in ctype.h
  - From: Joerg Sonnenberger
- Re: using the interfaces in ctype.h
  - From: Greg A. Woods; Planix, Inc.
- Re: using the interfaces in ctype.h
  - From: Joerg Sonnenberger
- Re: using the interfaces in ctype.h
  - From: Greg A. Woods; Planix, Inc.

Prev by Date: Re: using the interfaces in ctype.h
Next by Date: audioplay -u
Previous by Thread: Re: using the interfaces in ctype.h
Next by Thread: Re: using the interfaces in ctype.h
Indexes:

Home | Main Index | Thread Index | Old Index