[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Proposal: _ctype_ table bitwidth change
>> I maintain that is*() must be passed the codepoint - the "character".
> FWIW, from my admittedly rudimentary knowledge of i18n issues, is*()
> should be passed the character encoded appropriately for the current
> locale. [...] and I think if you consider how things should work in
> a single-byte encoding--say iso-8859-6, it'll be clearer that is*()
> should be passed the encoding, rather than the codepoint: If you've
> got a string encoded in 8859-6, and want to walk through it calling
> isspace(), you'd just pass each octet to isspace(). You wouldn't
> first run iconv() or the like on it to convert each octet to its
> Unicode code point.
Why would you convert it to a Unicode anything? You're not using
You'd "convert" it to its codepoint in the character set in use:
8859-6. Like most (all?) uses of 8-bit charsets, that conversion is
the identity mapping, so it's not clear to me that you're _not_ doing
it when you just pass the octets to is*().
> Also, see the definition of a character at [...] : "A sequence of one
> or more bytes representing a single graphic symbol or control code.
> Note: This term corresponds to the ISO C standard term multi-byte
> character, where a single-byte character is a special case of a
> multi-byte character."
Ah! That is the standards reference I was looking for upthread.
I think it's stupid, but it's far from the first thing POSIX has done I
think is stupid. Just as well I haven't tried to do Unicode-using code
on NetBSD; if it's trying to conform to that particular mistake then it
is not an appropriate platform for such work.
I also maintain that that definition is inconsistent with the
definition of is*(), which take a byte, not "a sequence of one or more
bytes...", but which - at least on NetBSD - is documented as taking a
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Main Index |
Thread Index |