Re: Proposal: _ctype_ table bitwidth change

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: Proposal: _ctype_ table bitwidth change
From: David Huang <khym%azeotrope.org@localhost>
Date: Wed, 23 Mar 2011 12:52:29 -0500

On Mar 23, 2011, at 12:32 PM, der Mouse wrote:

> If you believe that is*() must be passed the encoding, rather than the
> codepoint, then yes, it doesn't make sense to ask what isspace()
> returns for NO-BREAK SPACE in en_US.UTF-8, because you can't pass
> NO-BREAK SPACE to it (and what it returns for 0xa0 doesn't matter,
> because that's not a valid encoding).
> 
> I maintain that is*() must be passed the codepoint - the "character".

FWIW, from my admittedly rudimentary knowledge of i18n issues, is*() should be 
passed the character encoded appropriately for the current locale. Obviously, 
this is less than ideal for multi-byte encodings such as UTF-8, which is why 
there's isw*(). However, it works fine for the single-byte encodings, and I 
think if you consider how things should work in a single-byte encoding--say 
iso-8859-6, it'll be clearer that is*() should be passed the encoding, rather 
than the codepoint: If you've got a string encoded in 8859-6, and want to walk 
through it calling isspace(), you'd just pass each octet to isspace(). You 
wouldn't first run iconv() or the like on it to convert each octet to its 
Unicode code point.

Also, see the definition of a character at 
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_87
 : "A sequence of one or more bytes representing a single graphic symbol or 
control code. Note: This term corresponds to the ISO C standard term multi-byte 
character, where a single-byte character is a special case of a multi-byte 
character."
-- 
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: khym%azeotrope.org@localhost |  they raise a paw / the bat, the cat /
FurryMUCK: Dahan         |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 35 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++

Follow-Ups:
- Re: Proposal: _ctype_ table bitwidth change
  - From: der Mouse

References:
- Re: Proposal: _ctype_ table bitwidth change
  - From: Joerg Sonnenberger
- Re: Proposal: _ctype_ table bitwidth change
  - From: T.SHIOZAKI
- Re: Proposal: _ctype_ table bitwidth change
  - From: Joerg Sonnenberger
- Re: Proposal: _ctype_ table bitwidth change
  - From: T.SHIOZAKI
- Re: Proposal: _ctype_ table bitwidth change
  - From: der Mouse
- Re: Proposal: _ctype_ table bitwidth change
  - From: KAMADA Ken'ichi
- Re: Proposal: _ctype_ table bitwidth change
  - From: der Mouse

Prev by Date: Re: Proposal: _ctype_ table bitwidth change
Next by Date: Re: Proposal: _ctype_ table bitwidth change
Previous by Thread: Re: Proposal: _ctype_ table bitwidth change
Next by Thread: Re: Proposal: _ctype_ table bitwidth change
Indexes:

Home | Main Index | Thread Index | Old Index