tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change



At Wed, 23 Mar 2011 13:32:40 -0400 (EDT), der Mouse wrote:
> 
> > NO-BREAK SPACE, which is 0xC2 0xA0 in en_US.UTF-8, obviously falls
> > into the some-codepoints-that-can't-fit-in-unsigned-char category.
> 
> It's certainly not obvious to me.  That is not the codepoint but the
> encoding of the codepoint.  The codepoint is the abstract integer 160,
> which _does_ fit into unsigned char.

Hm, I carelessly quoted "codepoint" from your previous message,
sorry about that.  I should've said
"NO-BREAK SPACE, which is 0xC2 0xA0 in en_US.UTF-8, obviously falls into
the some-characters-that-can't-fit-in-unsigned-char category".

> I maintain that is*() must be passed the codepoint - the "character".

You might forget that some character encoding scheme can encode
multiple coded character sets.  It is not always possible to convert
characters into unoverwrapping sets of "codepoints".

Ken


Home | Main Index | Thread Index | Old Index