Subject: Re: Unicode support in iso9660.
To: Jason Thorpe <thorpej@shagadelic.org>
From: Reinoud Zandijk <reinoud@netbsd.org>
List: tech-kern
Date: 11/23/2004 13:06:57
--E39vaYmALEf/7YXx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Dear folks,

On Fri, Nov 19, 2004 at 07:44:46AM -0800, Jason Thorpe wrote:
> On Nov 17, 2004, at 7:57 PM, MINOURA Makoto wrote:
> 
> > - Mountpoints
> >    (/<Russion dirname>/<Japanese dirname>/<German filename>,
> >     but this could not be accessible from processes with
> >     LC_ALL=de_DE.ISO8859-15 for example)
> 
> I think this could be handled if UTF-8 were the standard encoding for 
> userland<->kernel interaction, yes?

It would handle it fine yes. I thus think that UTF-8 (wich supports upto 
32+ bits chars) would be fine for this.

For current installations transition might be a bit tricky but on the other 
hand, providing a simple `dont translate' flag to mount will fix this too 
since the users on such a system aparently have found a way/procedure to 
work with it wich will then not have to be changed... 

Newly formatted filingsystems can be filled with whatever UTF-8 allows. 
Thus the example above "/<Russion dirname>/<Japanese dirname>/<German 
filename>" will be encoded in UTF-8 on disc and be fully accessible and 
readable given a good font-set :-)

When copying stuff from say an old disc to the new disc, filenames can be 
translated acording to the current LC setting; i.e. set the LC to `russian' 
encoding and copy the russian filenames, set the LC to `chinese' and copy 
the chinese filenames.... etc.

When copying stuff from say ISO9660, UDF or NTFS filingsystems who do have 
a notion of `encoding' the filingsystems can translate to/from UTF-8 before 
leaving the filingsystem.

> My feeling is that the convergence point should be "UTF-8 at the system 
> call layer", i.e. userland gives UTF-8 names to the kernel, the kernel 
> gives UTF-8 names to userland.  It would then be the responsibility of 
> the individual applications/system libraries/kernel subsystems to do 
> whatever translation to/from UTF-8 is required.

Looks good.

Cheers,
Reinoud


--E39vaYmALEf/7YXx
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iQEVAwUBQaMn14KcNwBDyKpoAQJ23ggAw8RDT5MlFGpI/N6HrVnJJ0Cxxtp5BZ+9
XmZ0Xn+ZUwKnxge8ruYPj6jmhXwRv1oqma3FD1YvUwdOrhJYvPyVJk/2HHRkvWV/
+D36tGAtpeXB1Iizv4g7rQOlqC3gz9cadYIBRladAoSQzw6M5HkP4RVn2FMl4biZ
90HuyYaNSSGbQZopn86w2hOtWGjxB3S9JSh6ovN777iCxbTmTIrCUdtVQ8yNb1a4
dLaO+9e1vyR4flNYGxFAnN0oWq/ByDhQKXFCtYaUF6lEUb1/fV6EsULcqRS+BEyp
BCcdRwS5VBxE21LUlVVv3ZV2VD3gMxbkRrIntA3USF30mv0lcYJe8Q==
=LFcQ
-----END PGP SIGNATURE-----

--E39vaYmALEf/7YXx--