tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: _ctype_ table bitwidth change


> Yes, but that doesn't mean they can't use the same format. The problem
> in this case is that 0xa0 is not a valid UTF8 sequence by itself.

first, don't forget __STDC_ISO_10646__ wchar_t implementation.

as you know NetBSD/Citrus is CSI(CodeSet Independent), so wchar_t !=
UCS4 if encoding is not UTF-8.
but we still have ability/flexibility to provide wchar_t == UCS4
normalized implementation like glibc2.
(e.g ja_JP.eucJP@ucs4, ar_AR.ISO-8859-6@ucs4, past Tru64UNIX did).

for example, consider Arabic singlebyte locale "ar_AR.ISO-8859-6":

  [singlebyte] <---> [wide-character]
  0xac                   U+0x060C

apparently we can't share _ctype_/rl_runetype table.

second, for some restricted environment(embedded device, old computer etc...)
we offer makefile knob WITH_RUNE=NO to disable multibyte locale support
to reduce libc size.
if you expose RuneLocale structure, we have to kill the knob and
always use huge rune locale db file.
# i think it is not good news for some third party.

> Drop the current _CTYPE_* macros for anything but
> legacy purposes.

my patch have _CTYPE_* macro, but it is not same as legacy ctype.h:

see following diff:
Index: sys/sys/ctype_bits.h
RCS file: /cvsroot/src/sys/sys/ctype_bits.h,v
retrieving revision 1.2
diff -u -r1.2 ctype_bits.h
--- sys/sys/ctype_bits.h        14 Dec 2010 02:28:57 -0000      1.2
+++ sys/sys/ctype_bits.h        8 Jan 2011 14:01:27 -0000
@@ -40,16 +40,22 @@
 #ifndef _SYS_CTYPE_BITS_H_
 #define _SYS_CTYPE_BITS_H_

-#define        _CTYPE_U        0x01
-#define        _CTYPE_L        0x02
-#define        _CTYPE_N        0x04
-#define        _CTYPE_S        0x08
-#define        _CTYPE_P        0x10
-#define        _CTYPE_C        0x20
-#define        _CTYPE_X        0x40
-#define        _CTYPE_B        0x80
+#define _CTYPE_A       0x0001  /* Alpha     */
+#define _CTYPE_C       0x0002  /* Control   */
+#define _CTYPE_D       0x0004  /* Digit     */
+#define _CTYPE_G       0x0008  /* Graph     */
+#define _CTYPE_L       0x0010  /* Lower     */
+#define _CTYPE_P       0x0020  /* Punct     */
+#define _CTYPE_S       0x0040  /* Space     */
+#define _CTYPE_U       0x0080  /* Upper     */
+#define _CTYPE_X       0x0100  /* X digit   */
+#define _CTYPE_B       0x0200  /* Blank     */
+#define _CTYPE_R       0x0400  /* Print     */
+#define _CTYPE_I       0x0800  /* Ideogram  */
+#define _CTYPE_T       0x1000  /* Special   */
+#define _CTYPE_Q       0x2000  /* Phonogram */

-extern const unsigned char     *_ctype_;
+extern const unsigned short    *_ctype_tab_;
 extern const short     *_tolower_tab_;
 extern const short     *_toupper_tab_;

relation of _CTYPE_* and _RUNETYPE_* is:

    (_CTYPE_A << 8) == _RUNETYPE_A

very truly yours.
Takehiko NOZAKI<>

Home | Main Index | Thread Index | Old Index