tech-kern: Re: codeconv v3 - kernel code set recoding engine

Subject: Re: codeconv v3 - kernel code set recoding engine
To: None <dolecek@ics.muni.cz>
From: Noriyuki Soda <soda@sra.co.jp>
List: tech-kern
Date: 03/07/2000 22:44:57
> > The name "k2u" and "u2k" seems to be inappropriate, since this feature
> > might be used for kernel -> kernel conversion.
> 
> Hmm, probably. They actually recode from (filesystem) "native" encoding
> to "user land presentable" one. The result need nod to be actually passed
> to userland. Probably codeconv_enc() and codeconv_dec() would
> be better ? Or codeconv_{encode|decode} ? The latter is probably too long
> already.

No. It is better to use different codeconv_t.
For example:
	(1) VFAT vs SJIS userland.
		codeconv_t *k2u = codeconv_open("UTF-16LE", "SJIS");
		codeconv_t *u2k = codeconv_open("SJIS", "UTF-16LE");
	(2) SJIS MS-DOS fs (not VFAT, but FAT) vs UTF-8 userland:
		codeconv_t *k2u = codeconv_open("SJIS", "UTF-8");
		codeconv_t *u2k = codeconv_open("UTF-8", "SJIS");
	(3) NFSv4 with UTF-8 vs SJIS userland
		codeconv_t *k2u = codeconv_open("UTF-8", "SJIS");
		codeconv_t *u2k = codeconv_open("SJIS", "UTF-8");
I think there is no reason to use one codeconv_t for opposite
direction conversion.

> > It is better to use two different codeconv_t for CODE-A -> CODE-B
> > conversion and CODE-B -> CODE-A conversion, rather than one codecode_t
> > with encode/decode functions.
> 
> I think about my way (i.e. one codeconv_t for both A->B and B->A)
> as a convenience feature. It doesn't cost anything and catches
> typically used case.

No, it does cost.
There are cases that only one direction conversion is needed.

> I'd rather pass the endiannes as a flag to codeconv_open() (i.e.
> add a parameter to it).

UTF-8, SJIS, eucJP doesn't need to pass the endian flag.
Only UTF-16 needs the endianess.

IMHO, passing endiannes is wrong abstraction. Why passing endianess is 
needed although more general function like iconv(3) doesn't need that?

> > I know why you'd like to define these functions.
> > But these funcions should not be basic functions, but convenient
> > functions. Basic comparison function should not be depend on
> > codeconv_t. It should be only depend on specific codeset and specific
> > filesystem.
> 
> Exactly, convenience functions. They are typically needed (e.g.
> NTFS needs them, Joliet needs them, as well as does MSDOSFS), so
> IMHO it makes sense to offer them as part of interface.

It makes sense to use/share same function and implementation for NTFS
and Joliet extension.
But it doesn't make sense to implement it on codeconv layer.

> > Please note that these comparison functions depend on specific
> > filesystem type. Even two file systems use same codeset, filename
> > comparison result might be different between different filesystem
> > types!
> 
> Sure, but I don't force the caller to anything; all the compare
> functions are offering is way to compare encoded and unencoded
> string. Not more, not less.

Case folded comparison is quite difficult than what you thought.

For example, I've heard that there is a difference between MS-Windows
98 and MS-Windows NT about filename comparison. (e.g. handling of
Cyrillic characters)
If you combine case-folded comparison feature with codeconv layer,
you cannot use following codeconv_t:
	codeconv_t *cc = codeconv_open("SJIS", "UTF-16LE");
rather, you have to use this:
	codeconv_t *cc = codeconv_open("SJIS", "UTF-16LE-Win95");
		for Windows 98
	codeconv_t *cc = codeconv_open("SJIS", "UTF-16LE-WinNT");

Do you really want to do this?
I don't think so.
--
soda