Subject: Re: Japanese with wscons?
To: Christos Zoulas <christos@zoulas.com>
From: Bang Jun-Young <bjy@mogua.org>
List: tech-kern
Date: 01/13/2001 11:30:54
Christos Zoulas wrote:
> 
> In article <13662.979294370@coconut.itojun.org>,  <itojun@iijlab.net> wrote:
> >
> >       if you are in UTF8 mode, you can send UTF8 stream and print some text.
> >       however, due to han unification, some of Japanese/Chinese/Taiwanese
> >       characters will be presented in wrong glyph.  the problem do not
> >       exist for other iso-2022 variants.
> >       (i don't have any example here)
> 
> If I understand correctly, this is a font problem. If you had a
> font that contained the correct glyph for each Unicode character,
> then the display would be correct. The problem is that you are
> trying to use a font that does not contain all the glyphs you need.

It's not very often case to use Korean Han characters and Japanese Han 
characters at the same place (file, screen, etc.). Japanese and Chinese 
highly depend on usage of Han ideographs, but Korean doesn't (we don't
use Han much in everyday life). However, when using Japanese Han 
characters and Chinese Han characters together complexity and confusion 
may arise.

Last night I checked unicode.org and saw there UCD 3.1 Beta going on:

Unicode 3.1 contains a large number of new characters, in the following 
new blocks:

10300; 1032F; Old Italic
10330; 1034F; Gothic
10400; 1044F; Deseret
1D000; 1D0FF; Byzantine Musical Symbols
1D100; 1D1FF; Musical Symbols
1D400; 1D7FF; Mathematical Alphanumeric Symbols
20000; 2A6D6; CJK Unified Ideographs Extension B
2F800; 2FA1F; CJK Compatibility Ideographs Supplement
E0000; E007F; Tags

As seen above, the number of CJK ideographs in Unicode is continuously 
increasing. The problem is that we have no way to include more than 
500,000 Han characters in 16-bit Unicode basic plane. How many Han 
characters do Japanese/Chinese/Vietnamese need for everyday usage? 
How many characters are defined in ISO-2022-JP?

> 
> As for the missing characters, my understanding is that they are
> rarely used and this is why they were not included in the unicode
> standard.
> 
> I been through the encoding hell with japanese before; having three
> different encodings is simply madness, as well as trying to guess
> which one you have by looking at the bytestream [this shit is
> actually published as a valid algorithm in the CJK book]. Let's
> stick with Unicode; it is not perfect but it is the best you can
> get without creating a mess.

From the point of a programmer, adopting Unicode as the encoding method 
makes things easier and cleaner, yet we need to support other encodings. 

Jun-Young

--
Bang Jun-Young <bjy@mogua.org>