tech-kern: Re: Mount option to ignore case

Subject: Re: Mount option to ignore case
To: Bill Studenmund <wrstuden@netbsd.org>
From: Johan Ihren <johani@autonomica.se>
List: tech-kern
Date: 04/03/2002 19:00:50
Bill Studenmund <wrstuden@netbsd.org> writes:

> I believe we lose the moment we start talking about case insensitivity.
> If we don't want to, "be lost," then we shouldn't do case insensitivity.
> :-)

Fair enough.

> > What locale will you use? The one used by the creator of the file? The
> > one you (as the reader) uses? The one indicated by the choice of
> > characters in the filename?
> 
> That's what I'd like us to make a choice on. I'd vote for the locale
> of the reader is using. But what we choose is secondary to the idea
> that we need to address the problem.
> 
> > > >From the comments I heard in one of the plenary sessions at the last IETF,
> > > is that that mapping table (from the unicode consortium) isn't too useful.
> > > It was made to please everyone and as such pleases no one.
> >
> > That's more or less my impression also. However (somewhat going out on
> > limb here), my impression is also that there is no solution to be
> > found within the Unicode framework. For exactly the reason that they
> > work with characters rather than language. I.e. we shouldn't scorn the
> > Unicode people for failure to solve problems that are inherent to the
> > framework they are working within.
> 
> ?? I don't think I was scorning the Unicode folks, I was saying we
> shouldn't use their "universal" case folding table. :-)

Sorry, that came out the wrong way. I didn't mean to imply that you
scorned anyone, but rather that the case mapping tables from Unicode
are more or less as good as that can be, given their constraints. The
people who complain do it mostly because they want a panacea solution
that simply isn't possible.

> > So the best you can do is to decide on one mapping table (possibly
> > versioned) that will manage that subset of characters that it covers
> > and the rest will, for the time being, not be case converted in name
> > comparisions.
> >
> > If I create a filename consisting of a mixture of swedish, turkish and
> > ethiopian characters there is no language-based case conversion
> > *possible* that can sort out the result. For instance (well known
> > example), our lower case "i" has different uppercase equivalents in
> > english and turkish. But no language-sensitive system will ever be
> > able to sort out whether it was a lower case swedish "i" or a
> > lowercase turkish "i" that was intended.  You will have to choose one
> > of:
> >
> > a) choose one of the mappings and lose the other(s).
> >
> > b) decide that "i" cannot be case converted for the purposes of
> >    filename comparision. That would hurt quite badly, since we *can*
> >    case convert "i" right now.
> >
> > c) make everything *really* ugly and only allow characters from the
> >    present locale when creating new files (so that it is possible to
> >    know that all characters are from the same locale). Not even worth
> >    thinking about.
> 
> d) figure out a way for different users to use different tables (locale
> specific)

Urk. We're talking about identifiers here. Identifiers should have
exact matching rules or everything becomes very strange. To my mind
having that type of fuzzy locale-dependent filename matching would be
almost like (contrived example follows) having locale-dependent
variable *names* in the src to a program that I run.

> e) don't bother with case insensitivity now since it's such a mess

Much better.

> >From my work at Stanford, I was in a lab with (administered a machine used
> by) American, French, Swiss, Turkish, Taiwanese, and Japanese users. So we
> certainly ran the gauntlet of the case issues brought up. We only handled
> US-ASCII so we side-stepped everything at the time. :-)

;-)

> But my not including "language" do you really get anywhere we want to go?
> 
> Oh, my votes are for d) or e) above.

If you can have two votes, then so can I ;-) My second vote would then
obviously be for (e), which makes this the dominant alternative. But
I'm not convinced that everyone (or even anyone) else would agree with
that.

Regards,

Johan