tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change




> On Tue, Mar 22, 2011 at 08:41:04AM +0900, T.SHIOZAKI wrote:
> > Here,
> >   - 0xa0 is representable as an unsigned char, and
> >   - 0xa0 is not a space character.
> 
> 0xa0 as Unicode Code Point is not representable as unsigned char with
> UTF-8 encoding. This is different from a wchar_t, which is an internal
> encoding (in this case, most likely using UCS-2 or UCS-4).
> 
> > Thus, to conform to the standard, the behavior of isspace(0xa0) should
> > be defined and it should return 0, even if 0xa0 is not a valid character.
> 
> It is not "even if". You are reversing cause and effect. 0xa0 is not a
> valid space character, since it is not a valid character by itself.
> That's why all functions but isascii should fail.

Agreed.
The most important point is that is* functions accept an octet,
not a code point.


> My argument is that we still need and want a separate table, but it can
> and should have the same format as the full rune table. E.g. effectively
> variant 1.

It is a little difficult to decide whether the full rune table should be
exposed directly in ctype.h.

NetBSD is used not only for full desktop environments but also for
embedded environments.  In the latter cases, full locale implementation
may be just obstructive.  To implement is* functions defined in
ctype.h, 32bit table is too plentiful and 16bit table is adequate.
Nozaki-san probably decide so.

Of course, exposing full rune table does not mean that it is impossible
to disable full locale stuff, and nowadays we need not worry about
difference of such table size.


---
Takuya SHIOZAKI


Home | Main Index | Thread Index | Old Index