Subject: Re: Japanese with wscons?
To: Bang Jun-Young <bjy@mogua.org>
From: None <itojun@iijlab.net>
List: tech-kern
Date: 01/12/2001 15:47:18
>>         one queetion: how do you distinguish between CJK unified ideograms?
>>         for some codepoint japanese/chinese/tiwanese/korean glyphs are
>>         different, and it is the very problem why many of people do not
>>         advocate Unicode and prefer iso-2022.  i'd really like to see
>>         iso-2022 variants supported, since i'd like to mix CJK fonts, in
>>         proper glyphs.
>Are there Han characters defined in ISO-2022-JP and EUC-JP but not in
>Unicode CJK unified ideographs region (and Extended)? In Korean KSX-1001 
>standard (formerly known as KSC-5601) there are only 4888 Han ideographs 
>defined and there's no problem since CJK ideographs include all of them. 

	due to CJK unified ideograph conversion from CJK (like iso-2022-{jp,kr})
	to Unicode is a lossy conversion.  once you convert things into
	Unicode, you cannot recover language information back.  in unified
	ideographcs region we need to use different glyphs depending on
	language, and is not possible if you use UCS4. (if you see UCS4 stream
	or UTF8 stream, you cannot guess which CJK font you need to use)

	there are letters outside of Unicode spec, and in JIS X0213.

>Making the driver support ISO-2022 shouldn't be hard, but I'm worried 
>about reproduction of "Tower of Babel" in 21st century. For example 
>ISO-2022-JP and ISO-2022-KR are not compatible and can't used together 
>even though they use the same encoding method for representing characters. 
>EUC-JP and EUC-KR have the same problem, too. I'd like to see Japanese
>and Korean (and many other languages as well) together on a screen. 

	when you would like to mix japanese and korean characters, you can
	either use iso-2022-jp-2 (RFC1554), or X11 ctext.
	we can use the single code to handle all the iso-2022 variants.
	see pkgsrc/x11/kterm, or pkgsrc/editors/nvi-m17n.  it is not a
	tower of babel.  we have been doing it and is not hard.

	you cannot mix japanese and korean characters if you use euc-jp or
	euc-kr.

itojun