tech-kern: Re: wsfont encoding

Subject: Re: wsfont encoding
To: None <marcus@idonex.se>
From: Noriyuki Soda <soda@sra.co.jp>
List: tech-kern
Date: 02/03/2001 08:05:39
Noriyuki> Thus, using CJK ideogram font for other country doesn't have sense.

> That's why I said GB to Big5, since both can be used for Chinese.

Although I forgot to mention, Big5 font is not really usable on China,
either. Because certain characters are missing.
For example, 0x9aa8 in Unicode has different glyph in Taiwan and
China. You can confirm this by looking at 0xb0a9 in Big5 and 0x3947 
in GB2312.

Noriyuki> For single byte encodings, only wscons userlevel codeset which should 
Noriyuki> be supported by default is the same encoding with font encoding.

> This is much too restrictive for the general case.  If you have for
> example a 8859-15 codeset and a 8859-1 font, you should be able to see
> the glyphs for the overlapping characters.  This was my initial
> requirement.  So how will this be accomplished?

Just add the conversion table.
As I already said, all thing which can be supported by Unicode based
interface can be supported by codeset independent interface.
Because the latter is more general.

Noriyuki> We should supply UTF-8 (new standard) and ISO-2022 (VT100 standard) 
Noriyuki> too. But those don't have to be included in kernel by default 
Noriyuki> (Some of those should be added in /etc/rc, though). Other encodings 
Noriyuki> should be optional (put the table on userland, user configuration
Noriyuki> is required).

> Having support for different encodings as loadable modules is probably
> a good idea.  But I'm still curios about how this user configuration
> will work.  If I configure that I want to use ISO-2022 with (among
> others) 8859-15 codeset and that I want to have 8859-4 fonts, where
> will the translation tables/functions come from?

It can be implemented by dedicated module which breaks code sequence to
(font_encoding, font_index). But that is probably overkill for 
such simple and common requirement.

Logically, the following (A) or (B) is considered as the place 
where such conversion will be done.

		|
	  (code sequence)
		|
		| .... (A) convert a codeset to another codeset.
		v
	[1] codeset handling layer
		|
	  (font_encoding, font_index)
		|
		| .... (B) convert a font_encoding to another font_encoding.
		v
	[2] rendering interface

(A) needs an interface like iconv, and its implementation is relatively
hard. But because we need kernel iconv for filesystem anyway, the
hardness may not be a problem. Real problem of (A) is its implmentation
will be very close to the codeset handling layer itself. (That means
doing same thing twice.) Thus having conversion in (A) may not make sense.

Implmentation in (B) is easy, and it is likely what we currently have
(i.e. mapchar), and probably this is the suitable way for simple 
mapping like your example.

Noriyuki> For multibyte encodings, things are completely different.
Noriyuki> Because ISO-2022 and EUC are both encoding scheme, only certain 
Noriyuki> configuration (private final character for ISO-2022, 4 integers
Noriyuki> for Gx graphic plane setting for EUC) is needed, and no conversion 
Noriyuki> table is needed in these case.

> Final characters and EUC plane settings can probably be put in the
> kernel as default, they are not so big.  This includes final
> characters for single byte encodings as well.

Yes, having some default values may be fine.
(Although those settings should be able to be modified, and font 
 should be loadable.)
--
soda