tech-kern: Re: codeset recoding engine

Subject: Re: codeset recoding engine
To: None <itojun@iijlab.net>
From: Erik Bertelsen <erik@mediator.uni-c.dk>
List: tech-kern
Date: 11/14/1999 09:15:37

On Sun, Nov 14, 1999 at 08:51:30AM +0900, itojun@iijlab.net wrote:
> 
> >> > I think you need two conversons:
> >> > kernel: filesystem-charset to utf-8
> >> > then
> >> > userland: utf-8 to LC_CHARSET.
> 
> 	Sorry I may not follow the discussion, but...
> 	Please don't ever, ever hardcode something to utf-8.  There are
> 	character sets that contain characters that are not covered in utf-8.
> 	It is NOT universal.
> 
> itojun

Please be careful about the terminology: In my understanding,  UTF-8 is -not-
a character code (character set), but an encoding of multibyte characters into
a sequence of bytes that are safely transmittable over a pure 7-bit ASCII
channel.

UTF-8 may be used to encode characters in several character codes (sets), e.g.
LATIN-1 and UNICODE. Note that even for LATIN-1, UTF-8 is not the identity mapping.

I also think (but am not 100% sure) that UTF-8 is able to encode full ISO 10646
characters if needed.

- Erik