Subject: Re: codeconv v3 - kernel code set recoding engine
To: None <dolecek@ics.muni.cz, tech-kern@netbsd.org>
From: Noriyuki Soda <soda@sra.co.jp>
List: tech-kern
Date: 03/03/2000 17:26:36
> Recode functions
> ----------------
> int codeconv_k2u (codeconv_t *cc, char *dst, size_t dstlen,
> 	const void *src, size_t srclen, size_t *writelen,
> 	size_t *readlen);
> int codeconv_u2k (codeconv_t *cc, void *dst, size_t dstlen,
> 	const char *src, size_t srclen, size_t *writelen,
> 	size_t *readlen);
> 
> codeconv_k2u() recodes string in filesystem-native representation
> to user land represenation.

Mm.
The name "k2u" and "u2k" seems to be inappropriate, since this feature
might be used for kernel -> kernel conversion.
It is better to use two different codeconv_t for CODE-A -> CODE-B
conversion and CODE-B -> CODE-A conversion, rather than one codecode_t
with encode/decode functions.

> Note that the kernel representation is assumed to be
> in little endian format. As MSDOSFS, NTFS and
> Joliet all use it, this assumption seems to be ok to make.

Hmm. Did you confirm that this is also true on HFS on next generation
of MacOS?  (Mac is big endian machine)
I think it is better to explicitly specify "UTF-16-LE" (little endian)
or something as codeset name.

> Both codeconv_k2u() and codeconv_u2k() return either 0
> if no error was encountered, or E2BIG if the destination
> string is not long enough to hold the result of recoding.

How about EILSEQ? Although currently there is no defininition of
EILSEQ in our errno.h, EILSEQ will be needed eventually.

> Compare functions
> -----------------
> int codeconv_cmp (codeconv_t *cc, const void *kstr, size_t kstrlen,
> 	const char *ustr, size_t ustrlen);
> int codeconv_icasecmp (codeconv_t *cc, const void *kstr, size_t kstrlen,
> 	const char *ustr, size_t ustrlen);
> 
> kstr is "string" in kernel/filesystem native code set,
> ustr is string in user land represenation. cc is assumed
> to be the one used for recoding from the appropriate kernel
> code set to user land code set.

I know why you'd like to define these functions.
But these funcions should not be basic functions, but convenient
functions. Basic comparison function should not be depend on
codeconv_t. It should be only depend on specific codeset and specific
filesystem.

Please note that these comparison functions depend on specific
filesystem type. Even two file systems use same codeset, filename
comparison result might be different between different filesystem
types!

Thus, these functions without filesystem type seems to be not right.
I think these functions should not be defined as codeconv layer,
but should be defined as part of filesystem implementation.
Some filesystems, especially filesystems from same vendor, might share
same comparison functions, though.

> I'd appreciate any feedback. I'd like to get this into tree
> (as well as the fs-specific bits) before 1.5 cut, if possible
> and appropriate.

I think so, too.
--
soda@sra.co.jp		Software Research Associates, Inc., Japan
(Noriyuki Soda)		   IT Industry System Division Group 3