tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change



On Tue, Mar 22, 2011 at 03:50:24AM +0900, Takehiko NOZAKI wrote:
> i wrote patch to increase _ctype_ table bit to 16bit, patch is here.
> ftp://ftp.netbsd.org/pub/NetBSD/misc/tnozaki/patch-insufficient_ctype_bits
> 
> any comment?

Never mind the utf-8 fubar...
Changing the table to 16 bits wide breaks binary compatibility.
To get extra bits you really need to add a second 257 byte table.
This might mean replicating some bits in both tables and/or looking
in both tables.

As an aside, a lot of code use the isxxx() and then make the assumption
that they have checked the 'standard' character set, not some random
dataset than depends on the locale (or any other system/program state).
For instance programs will check isdigit() and then subtract '0'.
While such programs are technically flawed (with the current locale
based isxxx() functions) they are doing what the original _ctype_[]
was intended for.
There are quite a few places where programs do need to know they
are looking at the libc compile time static const _ctype_[] array.
At the moment this is amost impossible.

As well at the use of isdigit() and isdigit() prior to number conversion,
there is also code that uses isalpha() when checking whether characters
are valid for variable names (eg shell scripts) - I'm not at all sure
how much sense this makes - since the shell script doesn't encode
it's own character set - which will very likely be different from the 
users own locale.

        David

-- 
David Laight: david%l8s.co.uk@localhost


Home | Main Index | Thread Index | Old Index