tech-kern: Re: Mount option to ignore case

Subject: Re: Mount option to ignore case
To: Bill Studenmund <wrstuden@netbsd.org>
From: Johan Ihren <johani@autonomica.se>
List: tech-kern
Date: 04/02/2002 18:10:07
Bill Studenmund <wrstuden@netbsd.org> writes:

> On Sat, 30 Mar 2002, Martin Husemann wrote:
> 
> Also, there is all of the unicode stuff we have hiding in ntfs right now.
> 
> > > IMHO, it would be better to implement simple case-insensitivity support
> > > than do nothing.
> >
> > I aggree.
> 
> Well, I'm concerned about adding this as it stands. We have a limited
> supply of mount flags, and once we allocate one for this, we are stuck
> with it. To be able to re-use it, we have to version the mount system
> call.
> 
> I also find it a bit ironic that an American is advocating more indepth
> multi-language support than the Europeans. :-)
> 
> > If/when we come up with a more general solution (something like the "struct
> > emul*" pointing from each process to it's active ABI description), we can
> > make the simple routines use that.
> >
> > But I'd suggest we start with the simple case now. Of course, if our friends
> > from the multi-byte locales join in and we come up with a general solution
> > soon, we can do it right at first try.
> 
> I'm not so much championing we get the full-case correct right from the
> start, but that we have a good idea of the full-case _design_ from the
> start. Then implement the "simple" case as a first step.
> 
> If we have a plan that should give us full multi-lingual localization,
> then doing it just for say 8859-1 for now would be fine.
> 
> I don't think this will be hard to do. Corner some of the multi-byte
> locale folks and get their input. It shouldn't take more than a week on
> the mailing list. :-)

If you take a look at what has been going on in the IETF IDN wg
(internationalized domain names) for more than two years now, I think
it will be clear not only that it'll take significantly longer (yes, I
saw the ;-) but also that looking at this from a perspective of
"language" (i.e. caring about locale) will in all likelihood bog down
into a trench that goes mostly nowhere.

If instead the problem is treated from a perspective of "characters",
i.e. a filename is a string of individual characters concatenated
together and these characters come from some sort of agreed upon set
of available characters (sofar this has been ASCII, and will likely be
Unicode in the future) then the "locale" aspect disappears. This is
*good*.

I.e. UTF-8 is only an encoding of a sequence of Unicode characters and
what is needed is mostly for case mapping tables as documented by the
Unicode consortium.

On the other hand, being swedish, I mostly restrict myself to browsing
around in iso-8859-1 and -2. But I have the deepest respect for the
serious problems associated with CJK-mappings and conversions (if at
all possible) between SC and TC in chinese. So I'm not really the one
to listen to here.

My point is that (a) avoid "language", stick to "characters" and (b)
before deciding on the "full solution" it is important to find out
whether such a solution *exists*. Oops, that's two points.

Regards,

Johan