tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Proposal: _ctype_ table bitwidth change
>>> setlocale(LC_ALL, "en_US.UTF-8");
>>> printf("isspace:%d\n", isspace((unsigned char)0xA0));
>>> printf("iswspace:%d\n", iswspace((wchar_t)0xA0));
> Here,
> - 0xa0 is representable as an unsigned char, and
> - 0xa0 is not a space character.
Um, in en_US.UTF-8, I believe it is; U+00A0 is NO-BREAK SPACE according
to the Unicode documents I have. Of course, you could argue that's not
a space character, but I'm having trouble imagining what values of
"space character" that could be true of.
I know it is not a valid UTF-8 sequence. isspace() is supposed to be
passed a character code, not an octet which may or may not be part of a
character code's encoding, as I understand it. That most UTF-8
character codes cannot be passed to isspace() because they're outside
of unsigned char range - at least on NetBSD - doesn't invalidate this
as far as I can see.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Home |
Main Index |
Thread Index |
Old Index