Subject: Re: Mount option to ignore case
To: Johan Ihren <johani@autonomica.se>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 04/02/2002 09:46:17
On 2 Apr 2002, Johan Ihren wrote:

> Bill Studenmund <wrstuden@netbsd.org> writes:
>
> > I don't think this will be hard to do. Corner some of the multi-byte
> > locale folks and get their input. It shouldn't take more than a week on
> > the mailing list. :-)
>
> If you take a look at what has been going on in the IETF IDN wg
> (internationalized domain names) for more than two years now, I think
> it will be clear not only that it'll take significantly longer (yes, I
> saw the ;-) but also that looking at this from a perspective of
> "language" (i.e. caring about locale) will in all likelihood bog down
> into a trench that goes mostly nowhere.
>
> If instead the problem is treated from a perspective of "characters",
> i.e. a filename is a string of individual characters concatenated
> together and these characters come from some sort of agreed upon set
> of available characters (sofar this has been ASCII, and will likely be
> Unicode in the future) then the "locale" aspect disappears. This is
> *good*.

The problem is that we are not just talking about characters, we are
talking about case-insensitive character matching. Different languages
have different ideas about what characters are the same.

> I.e. UTF-8 is only an encoding of a sequence of Unicode characters and
> what is needed is mostly for case mapping tables as documented by the
> Unicode consortium.

From the comments I heard in one of the plenary sessions at the last IETF,
is that that mapping table (from the unicode consortium) isn't too useful.
It was made to please everyone and as such pleases no one.

> On the other hand, being swedish, I mostly restrict myself to browsing
> around in iso-8859-1 and -2. But I have the deepest respect for the
> serious problems associated with CJK-mappings and conversions (if at
> all possible) between SC and TC in chinese. So I'm not really the one
> to listen to here.

I think the main problem (for this discussion) with CJK is that SC and TC
characters are equivalent to Chineese speakers, but not so for Japanese
speakers.

> My point is that (a) avoid "language", stick to "characters" and (b)
> before deciding on the "full solution" it is important to find out
> whether such a solution *exists*. Oops, that's two points.

My point is that if we don't include "language", then we will get a
case-insensitivity matching method which will be _not_ what a number of
users want. Depending on what we do, different groups of users will be in
the "not" group.

Take care,

Bill