Subject: Re: isprint()
To: Noriyuki Soda <soda@sra.co.jp>
From: Perry E. Metzger <perry@piermont.com>
List: current-users
Date: 08/24/1996 18:15:31
Noriyuki Soda writes:
> > >> On the other hand, we might want locale support in the kernel, in case
> > >> we ever end up supporting EBCDIC.
> > 
> > >Is it me, or is this just plain silly? (1/2 :-)
> > 
> > It's mostly silly.
> 
> I think EBCDIC support is silly, too, and it is hardly worth the effort.

On the other hand, UNICODE may become real at some point -- the Plan 9
people are using it, and it has distinct advantages.

> IMHO, All supported character-set should be upper compatible with ASCII,
> this requirement dramatically reduces the cost of implementation.

UNICODE in its UTF encoding is sort of ASCII compatible, but the extra
characters it supports frequently come in upper and lower case
varieties.

It is possible, however, that the right thing to do is to just support
UNICODE and quit worrying about internationalization because at that
point "locale" becomes meaningless from a character set point of view
-- all reasonable characters are then supported.

> IMHO, 16bit character-set is not sufficient, and 32bit character-set is
> needed for many compatibility reasons. 
> - UNICODE standard (UTF8) requires it.

Unicode is 16 bits, not 32 -- UTF is an encoding of UNICODE that
allows it to be "mostly" compatible with ASCII -- that is, ASCII files
are valid UTF but not always vice versa...

> - Most of modern UNICES support 32bit character-set
> - Taiwanese EUC character-set requires 32bit character

UNICODE has excellent support for Chinese characters, although there
are some complaints about the Han unification that was done. UNICODE's
set is actually a superset of all the national sets in the far east,
though, but with a differing order.

> But unlike Plan9, I think it is user-level issues (or at least most 
> of them are user-level issues), and it should be handle by I18N framework.
> i.e. Kernel and user-level code should have general way to handle many
> character-sets, and UNICODE (UTF8) should be supported as one of it.

Again, if we supported UNICODE, we might no longer care about
supporting multiple character sets since we would have virtually every
character anyone would ever want. (Fonts in X become a problem, of
course...)

.pm