tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change



>> I maintain that is*() must be passed the codepoint - the "character".
> FWIW, from my admittedly rudimentary knowledge of i18n issues, is*()
> should be passed the character encoded appropriately for the current
> locale.  [...] and I think if you consider how things should work in
> a single-byte encoding--say iso-8859-6, it'll be clearer that is*()
> should be passed the encoding, rather than the codepoint: If you've
> got a string encoded in 8859-6, and want to walk through it calling
> isspace(), you'd just pass each octet to isspace().  You wouldn't
> first run iconv() or the like on it to convert each octet to its
> Unicode code point.

Why would you convert it to a Unicode anything?  You're not using
Unicode!

You'd "convert" it to its codepoint in the character set in use:
8859-6.  Like most (all?) uses of 8-bit charsets, that conversion is
the identity mapping, so it's not clear to me that you're _not_ doing
it when you just pass the octets to is*().

> Also, see the definition of a character at [...] : "A sequence of one
> or more bytes representing a single graphic symbol or control code.
> Note: This term corresponds to the ISO C standard term multi-byte
> character, where a single-byte character is a special case of a
> multi-byte character."

Ah!  That is the standards reference I was looking for upthread.

I think it's stupid, but it's far from the first thing POSIX has done I
think is stupid.  Just as well I haven't tried to do Unicode-using code
on NetBSD; if it's trying to conform to that particular mistake then it
is not an appropriate platform for such work.

I also maintain that that definition is inconsistent with the
definition of is*(), which take a byte, not "a sequence of one or more
bytes...", but which - at least on NetBSD - is documented as taking a
character.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index