tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A draft for a multibyte and multi-codepoint C string interface



On Tue, 16 Apr 2013 10:53:25 +0200
Antoine LECA <antoine.leca.1%gmail.com@localhost> wrote:

> > The user has no way to know nor reason to care whether "året" uses
> > four Unicode code points or five. If he types "vi året", I think
> > the file should open if the character strings match regardless of
> > the byte-sequences, but today the odds are 1:4 against.  
> 
> First, let me note problems of this class only happen in heterogeneous
> environments, like people working on different computers on remotely
> mounted file systems; 

Yes, except that heterogeneity may be induced merely by using different
tools.  

I've gotten into the habit of saving PDFs by copying the title and
pasting it into the GUI's File Save dialog.  As an English speaker, I
don't have much problem with foreign words in titles, and simple things
like em-dashes I can manage.  If the filename has some non-ASCII
characters in it, they show up in my terminal, and I rename the file.
As a French speaker, were you to do the same, I would expect you'd come
across this issue with some frequency if you followed my practice.
(Although you might be saved by tab-completion.)  

It's interesting.  To the extent that filenames are selected by
pointing and clicking -- without alphabetic input -- the filename
really could be anything at all, even a TIFF image (cleansed of NUL and
'/')! It's only when the user specifies the name by typing,
when the system matches the supplied specification to the candidate
list, that the issue of "matching" even arises.  

> Then while the case you describe here seems horny, this behaviour is
> probably the correct one to use as default: unless special
> instructions are given, it seems to me adequate to drop the request
> and announce to the user that there are no such file as "a\u030Aret"
> as she asked: it allows to avoid an entire class of confusions which
> happen when glob() performs as the specifications say but against the
> user's intent.

I suspect that "entire class of confusions" is the empty set.  No one
has presented an example.  After all, the user is manipulating and
specifying strings.  What "intent" could he have regarding their
encoding?  

> Furthermore, if such a case happens often, I am certain the interested
> user will learn to work around it, perhaps using "vi [[.å.]]ret" 

I am sure you're right.  The current state of software is nothing if
not a testament to human adaptivity.  

--jkl


Home | Main Index | Thread Index | Old Index