tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change

On Tue, Mar 22, 2011 at 03:30:27AM -0400, der Mouse wrote:
> > 0xa0 as Unicode Code Point is not representable as unsigned char with
> > UTF-8 encoding.
> That doesn't even make sense.

Yes, it does.

> UTF-8 takes Unicode codepoints and produces not octets but sequences of
> octets.  The Unicode code point 0xa0 is representable, as a codepoint,
> as unsigned char; in this it is no different from any other integer in
> the range 0..255.  It is representable as an octet sequence via
> encodings such as UTF-8.  These two concepts should not be confused.

No, it isn't. There is no valid UTF-8 encoding of 0xA0 using a single
octet. Period. I haven't said anything else.


Home | Main Index | Thread Index | Old Index