Subject: Re: codeconv v3 - kernel code set recoding engine
To: Noriyuki Soda <soda@sra.co.jp>
From: PER4MANCE, J. Dolecek <jdolecek@per4mance.cz>
List: tech-kern
Date: 03/07/2000 19:59:28
Noriyuki Soda wrote:
> FAT of Japanese MS-DOS doesn't support eucJP, only supports SJIS.

But SJIS is code set using two byte codes, right ? How it
was written to the FAT 8.3 entry ?

Hmm, I dimly recall that czech MS-DOS might also used some national
characters. Of course not Unicode :) But I'm not going to handle
those czech specialities - it's all long dead and VFAT is a standard
now.

> FAT (before VFAT age) of Japanese MS-DOS only supports SJIS.
> Yet another surprising thing is that SJIS FAT contains "\" (0x5c) as
> pathname character. (as second byte of kanji).

Hmm.

> The assumption is wrong.
> For example, Japanese console i/o often only requires one directional
> conversion (i.e. for output only). Because input side is covered by
> userland input method. (The input method is typically > 1MB process
> size, and > 5MB dictionary size).

In current implementation, it's equally easy to use the same
codeconv_t for both ways (the structure contains both encode and decode
bits) as making it one-way. I aggree that hardcoding implementation
feature
into interface is a bad thing.

> No.
>         u = codeconv_k2u(cc, k);
>         k = codeconv_u2k(cc, u);
> isn't different from
>         u = codeconv(k2u_cc, k);
>         k = codeconv(u2k_cc, u);
> about type checking.

No significant difference. Anyway, it's probably better to have single
recoding function, to make the API simplier. I'll drop
the *_{u2k|k2u}() and leave just single codeconv().

> So, please don't just use "Unicode", but please use "UTF-16XX" or
> something.

AFAIK UTF-16 doesn't imply Unicode - UTF-16 (or UTF-8) is not a code
set,
it's just a way to write out codes. That's why I'd like
to keep the reference to Unicode in the name.

> Hmm, I'll try to think about better way to define name comparison
> functions.  Could you wait for a while?

Sure :)

> No. (at least for case conversion functions)
> If we don't use same way with original OS, we might make a filename
> which cannot be accessed from orignal OS. :-<

Hmm, that's true. But since the compare functions
are good for what we have in tree now, I'd be inclined to handle
such situation _after_ it's encountered. We would avoid
needless generalization.

Jaromir