tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change



hi,

> Changing the table to 16 bits wide breaks binary compatibility.

i don't break any binary compatibility, why do you think so?
i reserve old 8bit _ctype_ table for libc12.
(it may be removed from libc13 major bump by __LIBC12_SOURCE__ macro)


> To get extra bits you really need to add a second 257 byte table.
> This might mean replicating some bits in both tables and/or looking
> in both tables.

your idea is:

    extern unsigned char *_ctype_;
    extern unsigned char *_ctype_extra_bit_;

    #define isalpha(c) (_ctype_[c] & _ALPHA)
    ...
    #define isblank(c) (_ctype_extra_bit_[c] & _BLANK)

isn't it?
but current _ctype_'s bitmask pattern(include/sys/ctype_bits.h) is
quite strange.
at this point, i and joerg have same opinion(i  think), we would like to replace
more sane (such as _RUNETYPE_*) bitmask pattern.

so we need one more 8bit table:

    #ifdef __LIBC12_SOURCE__
    extern unsigned char *_ctype_; /* backward compatibility */
    #endif

    extern unsigned char *_ctype_new_abi1_;
    extern unsigned char *_ctype_new_abi2_;

    #define isalpha(c) (_ctype_new_abi1_[c] & _ALPHA)
    ...
    #define isblank(c) (_ctype_new_abi2_[c] & _BLANK)

how confusing.

my idea is simple:

    #ifdef __LIBC12_SOURCE__
    extern unsigned char *_ctype_; /* backward compatibility */
    #endif

    extern unsigned short *_ctype_new_abi_;

    #define isalpha(c) (_ctype_new_abi_[c] & _ALPHA)
    ...
    #define isblank(c) (_ctype_new_abi_[c] & _BLANK)


> As an aside, a lot of code use the isxxx() and then make the assumption
> that they have checked the 'standard' character set, not some random
> dataset than depends on the locale (or any other system/program state).


these assumption is completely wrong, is* func affected by current locale.

    The isalpha() function shall test whether c is a character of class alpha
    in the program's current locale
    http://pubs.opengroup.org/onlinepubs/009695399/functions/isalpha.html

but posix defined Portable Character Set(similar to ISO646).
    http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html
all the encoding supported by locale must be the superset of Portable
Character Set.
so you don't worry about in many case.


very truly yours.
-- 
Takehiko NOZAKI<takehiko.nozaki%gmail.com@localhost>


Home | Main Index | Thread Index | Old Index