[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Proposal: _ctype_ table bitwidth change
> > The most important point is that is* functions accept an octet, not a
> > code point.
> They do? Where is this defined?
> Historically, it has been false: is*() has been documented to accept
> "characters", which I can't read as anything but codepoints.
> That some charsets have some codepoints that can't fit in unsigned char
> (at least when, as on NetBSD, unsigned char is just one octet) just
> means that is*() aren't useful for more than just 256 of their possible
> codepoints, not that they somehow get retconned to take just one octet
> of a storage encoding of a codepoint.
> At least, that's how I read it. Is there a spec somewhere which spells
> this out precisely?
As far as I know, there is no explicit description.
However, to begin with, ISO C doesn't define the concept of like "codepoint."
It defines only two representation; "(single-byte/multibyte) character" and
I wonder how is* functions are affected by undefined concept.
In addition, ISO C contains the part implying that is* functions accept
188.8.131.52 Wide character classification functions:
Each of the following functions (note: isw* functions) returns true
for each wide character that corresponds (as if by a call to the wctob
function) to a single-byte character for which the corresponding
character classification function (note: is* functions) from 7.4.1
returns true, except that the iswgraph and iswpunct functions may
differ with respect to wide characters other than L' ' that are both
printing and white-space wide characters.
('note' is inserted by me.)
Note that this part was added at revision in 1995 (C95).
ISO C seems to contain some ambiguity about "character,"
especially in the part that has been existing since 1989 (C89).
Main Index |
Thread Index |