Subject: Re: Mount option to ignore case
To: Johan Ihren <johani@autonomica.se>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 04/02/2002 14:04:33
On 2 Apr 2002, Johan Ihren wrote:

> Bill Studenmund <wrstuden@netbsd.org> writes:
>
> Bill,
>
> > The problem is that we are not just talking about characters, we are
> > talking about case-insensitive character matching. Different languages
> > have different ideas about what characters are the same.
>
> Exactly. And as soon as you start down that path you will be lost.

I believe we lose the moment we start talking about case insensitivity.
If we don't want to, "be lost," then we shouldn't do case insensitivity.
:-)

> What locale will you use? The one used by the creator of the file? The
> one you (as the reader) uses? The one indicated by the choice of
> characters in the filename?

That's what I'd like us to make a choice on. I'd vote for the locale of
the reader is using. But what we choose is secondary to the idea that we
need to address the problem.

> > >From the comments I heard in one of the plenary sessions at the last IETF,
> > is that that mapping table (from the unicode consortium) isn't too useful.
> > It was made to please everyone and as such pleases no one.
>
> That's more or less my impression also. However (somewhat going out on
> limb here), my impression is also that there is no solution to be
> found within the Unicode framework. For exactly the reason that they
> work with characters rather than language. I.e. we shouldn't scorn the
> Unicode people for failure to solve problems that are inherent to the
> framework they are working within.

?? I don't think I was scorning the Unicode folks, I was saying we
shouldn't use their "universal" case folding table. :-)

> So the best you can do is to decide on one mapping table (possibly
> versioned) that will manage that subset of characters that it covers
> and the rest will, for the time being, not be case converted in name
> comparisions.
>
> If I create a filename consisting of a mixture of swedish, turkish and
> ethiopian characters there is no language-based case conversion
> *possible* that can sort out the result. For instance (well known
> example), our lower case "i" has different uppercase equivalents in
> english and turkish. But no language-sensitive system will ever be
> able to sort out whether it was a lower case swedish "i" or a
> lowercase turkish "i" that was intended.  You will have to choose one
> of:
>
> a) choose one of the mappings and lose the other(s).
>
> b) decide that "i" cannot be case converted for the purposes of
>    filename comparision. That would hurt quite badly, since we *can*
>    case convert "i" right now.
>
> c) make everything *really* ugly and only allow characters from the
>    present locale when creating new files (so that it is possible to
>    know that all characters are from the same locale). Not even worth
>    thinking about.

d) figure out a way for different users to use different tables (locale
specific)

e) don't bother with case insensitivity now since it's such a mess

From my work at Stanford, I was in a lab with (administered a machine used
by) American, French, Swiss, Turkish, Taiwanese, and Japanese users. So we
certainly ran the gauntlet of the case issues brought up. We only handled
US-ASCII so we side-stepped everything at the time. :-)

> My vote would be for (a), but I realize that I'm probably biased due
> to my expectation that the english casefolding rules for "i" will be
> considered useful to a larger audience than the turkish version. Had I
> been from Turkey I might have made a different choice.
>
> ...
>
> > > My point is that (a) avoid "language", stick to "characters" and (b)
> > > before deciding on the "full solution" it is important to find out
> > > whether such a solution *exists*. Oops, that's two points.
> >
> > My point is that if we don't include "language", then we will get a
> > case-insensitivity matching method which will be _not_ what a number
> > of users want. Depending on what we do, different groups of users
> > will be in the "not" group.
>
> Correct. And by including "language" you'll get nowhere at all. Sorry.

But my not including "language" do you really get anywhere we want to go?

Oh, my votes are for d) or e) above.

Take care,

Bill