Subject: Re: codeconv v3 - kernel code set recoding engine
To: Noriyuki Soda <soda@sra.co.jp>
From: Jaromir Dolecek <dolecek@ics.muni.cz>
List: tech-kern
Date: 03/03/2000 13:48:00
Noriyuki Soda wrote:
> Mm.
> The name "k2u" and "u2k" seems to be inappropriate, since this feature
> might be used for kernel -> kernel conversion.

Hmm, probably. They actually recode from (filesystem) "native" encoding
to "user land presentable" one. The result need nod to be actually passed
to userland. Probably codeconv_enc() and codeconv_dec() would
be better ? Or codeconv_{encode|decode} ? The latter is probably too long
already.
 
> It is better to use two different codeconv_t for CODE-A -> CODE-B
> conversion and CODE-B -> CODE-A conversion, rather than one codecode_t
> with encode/decode functions.

I think about my way (i.e. one codeconv_t for both A->B and B->A)
as a convenience feature. It doesn't cost anything and catches
typically used case.

> > Note that the kernel representation is assumed to be
> > in little endian format. As MSDOSFS, NTFS and
> > Joliet all use it, this assumption seems to be ok to make.
> 
> Hmm. Did you confirm that this is also true on HFS on next generation
> of MacOS?  (Mac is big endian machine)
> I think it is better to explicitly specify "UTF-16-LE" (little endian)
> or something as codeset name.

No, I didn't. Hmm, have to look it up. Thanks for pointing out.

I'd rather pass the endiannes as a flag to codeconv_open() (i.e.
add a parameter to it).
 
> > Both codeconv_k2u() and codeconv_u2k() return either 0
> > if no error was encountered, or E2BIG if the destination
> > string is not long enough to hold the result of recoding.
> 
> How about EILSEQ? Although currently there is no defininition of
> EILSEQ in our errno.h, EILSEQ will be needed eventually.

Well, the code currently only checks for the full buffer condition.
"Invalid" characters are written to target string as "?". Actually,
caller typically can't do anything with slighly bad input,
so returning an error is probably not appropriate.

> I know why you'd like to define these functions.
> But these funcions should not be basic functions, but convenient
> functions. Basic comparison function should not be depend on
> codeconv_t. It should be only depend on specific codeset and specific
> filesystem.

Exactly, convenience functions. They are typically needed (e.g.
NTFS needs them, Joliet needs them, as well as does MSDOSFS), so
IMHO it makes sense to offer them as part of interface.

> Please note that these comparison functions depend on specific
> filesystem type. Even two file systems use same codeset, filename
> comparison result might be different between different filesystem
> types!

Sure, but I don't force the caller to anything; all the compare
functions are offering is way to compare encoded and unencoded
string. Not more, not less.

> Thus, these functions without filesystem type seems to be not right.
> I think these functions should not be defined as codeconv layer,
> but should be defined as part of filesystem implementation.
> Some filesystems, especially filesystems from same vendor, might share
> same comparison functions, though.

But this would mean codeconv engine would need to provide "hooks"
to do such comparison (i.e. something like the codeconv_readc(),
codeconv_convc() discussed last time) and the code handling code
set related functionality would be scattered on more places, making
any future changes more difficult to do. I'd not like that.  I'd also hate
to introduce similar-but-different API for filesystem name comparisons.

Jaromir
-- 
Jaromir Dolecek <jdolecek@NetBSD.org>      http://www.ics.muni.cz/~dolecek/
@@@@  Wanna a real operating system ? Go and get NetBSD, damn!  @@@@