tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change



>>>         setlocale(LC_ALL, "en_US.UTF-8");
>>>         printf("isspace:%d\n", isspace((unsigned char)0xA0));
>>>         printf("iswspace:%d\n", iswspace((wchar_t)0xA0));

> Here,
>   - 0xa0 is representable as an unsigned char, and
>   - 0xa0 is not a space character.

Um, in en_US.UTF-8, I believe it is; U+00A0 is NO-BREAK SPACE according
to the Unicode documents I have.  Of course, you could argue that's not
a space character, but I'm having trouble imagining what values of
"space character" that could be true of.

I know it is not a valid UTF-8 sequence.  isspace() is supposed to be
passed a character code, not an octet which may or may not be part of a
character code's encoding, as I understand it.  That most UTF-8
character codes cannot be passed to isspace() because they're outside
of unsigned char range - at least on NetBSD - doesn't invalidate this
as far as I can see.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index