Re: Proposal: _ctype_ table bitwidth change

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: Proposal: _ctype_ table bitwidth change
From: der Mouse <mouse%Rodents-Montreal.ORG@localhost>
Date: Wed, 23 Mar 2011 13:32:40 -0400 (EDT)

> NO-BREAK SPACE, which is 0xC2 0xA0 in en_US.UTF-8, obviously falls
> into the some-codepoints-that-can't-fit-in-unsigned-char category.

It's certainly not obvious to me.  That is not the codepoint but the
encoding of the codepoint.  The codepoint is the abstract integer 160,
which _does_ fit into unsigned char.

This is why I was drawing a distinction between codepoints and
encodings - serializations - of codepoints upthread.  In traditional
8-bit character sets, every codepoint has a one-octet encoding with the
trivial mapping between the codepoint and the value of that octet, so
the distinction is easy to lose track of.  But it's an important
distinction when dealing with charsets like Unicode and encodings which
(like UTF-8 and UTF-7) do not have that trivial a mapping between
codepoints and encodings.

If you believe that is*() must be passed the encoding, rather than the
codepoint, then yes, it doesn't make sense to ask what isspace()
returns for NO-BREAK SPACE in en_US.UTF-8, because you can't pass
NO-BREAK SPACE to it (and what it returns for 0xa0 doesn't matter,
because that's not a valid encoding).

I maintain that is*() must be passed the codepoint - the "character".

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: Proposal: _ctype_ table bitwidth change
  - From: KAMADA Ken'ichi
- Re: Proposal: _ctype_ table bitwidth change
  - From: David Huang

References:
- Re: Proposal: _ctype_ table bitwidth change
  - From: Joerg Sonnenberger
- Re: Proposal: _ctype_ table bitwidth change
  - From: T.SHIOZAKI
- Re: Proposal: _ctype_ table bitwidth change
  - From: Joerg Sonnenberger
- Re: Proposal: _ctype_ table bitwidth change
  - From: T.SHIOZAKI
- Re: Proposal: _ctype_ table bitwidth change
  - From: der Mouse
- Re: Proposal: _ctype_ table bitwidth change
  - From: KAMADA Ken'ichi

Prev by Date: Re: libquota proposal
Next by Date: Re: Proposal: _ctype_ table bitwidth change
Previous by Thread: Re: Proposal: _ctype_ table bitwidth change
Next by Thread: Re: Proposal: _ctype_ table bitwidth change
Indexes:

Home | Main Index | Thread Index | Old Index