Subject: Re: Unicode support in kernel
To: Noriyuki Soda <soda@sra.co.jp>
From: Jaromir Dolecek <dolecek@ics.muni.cz>
List: tech-kern
Date: 10/14/1999 20:26:57
Noriyuki Soda wrote:
> And doesn't handle multiuser case.

Right.

> It can be done in library level, or perhaps per process codeset
> attribute in kernel. Thus don't have to change all userland.
> (BTW, latter is.... mmmmm ;-))

I though about it a bit more and doing this per-process attribute
in kernel would not be actually very hard (at least it doesn't seem
to be :). Internally, the filenames would be kept in utf-8 and
on every pass from/to kernel (open(), creat(), getdents() etc.)
the filename would be recoded to/from the processes preferred vfs
charset. The recoding might be even done on library level - if
the preferred encoding would be in environment, that
would mean just one more system call (to find out if the
recoding is necessary for this particular filename). Ha, problem -
what if ntfs volume would be mounted on some ffs directory ? In
that case, part of the path would need to be recoded and part not.
So the recoding would has to happen on namei() level :(

If done right, similar mechanism could be quite easily extended
to other filesystems (most importanly, ffs). The filenames would be
stored in ufs-8 and recoded appropriately on-fly. The performance
hit should not be very bad.

However, I don't feel like doing that right now. Possible
future work :) For now, I'd just make the charset mount option.
Okay ?

> MS-DOS filesystem long filename extension uses both Unicode and
> Shift_JIS in same filesystem, I don't have enough clue about Joliet
> extension, but I suppose Joliet is same with msdosfs. (to achieve
> compatibility to existing Shift_JIS CD-ROM).

You asian folks seem to have big fun with the filenames :-/

Jaromir